# Reevaluating the Role of Generative AI in Finance
Written on
Chapter 1: Understanding Generative AI's Potential
When ChatGPT debuted in November 2022, my initial experimentation led me to suggest to my supervisor that its primary application in quantitative finance would be data augmentation. After all, it functions mainly as an advanced text generator, and for standard natural language processing tasks, smaller, specialized models often outperform larger, general models. While my assessment held some truth, the landscape of Generative AI is rapidly evolving, revealing a plethora of innovative applications, particularly in finance, with new research emerging almost daily.
In the beginning, many viewed large language models (LLMs) as potential replacements for search engines, given their ability to provide seemingly authoritative answers. However, we soon recognized their limitations—such as incomplete knowledge and fixed training cut-off dates—which clarified that they couldn't entirely replace traditional search engines, but rather serve as sophisticated reasoning tools. As we move past mere demonstrations and tutorials, I wish to gently investigate various perspectives on Generative AI and its practical applications in enhancing our professional and personal lives.
Key Takeaway: "Vector databases are the core of LLMs and a vital part of their technology framework."
As LLMs gained prominence, venture capital began pouring into a budding technology called vector databases, which function as "long-term memory" for LLMs. Most "Retrieval Augmented Generation" (RAG) tutorials and videos utilize vector databases as a crucial element of the LLM architecture. It’s evident that retrieving accurate information is essential for grounding LLMs, particularly as context length increases.
However, conflating a technology with its capabilities can lead to misunderstandings. For instance, if you adjust your query to a vector database just slightly while keeping its meaning intact, the results can vary significantly—this is attributed to how text embeddings are interpreted based on the model’s internal comprehension. Additionally, embedding models are essentially smaller LLMs, which also have knowledge cutoffs, meaning they may struggle with newer terminology, leading to subpar search outcomes. Terms that have emerged recently, such as "skimpflation," might not be accurately represented, showcasing how language's evolving nature can leave models behind.
Moreover, finance heavily relies on numerical data, an area where language models typically underperform. The challenge of vectorizing tables diminishes their significance in financial contexts.
Information retrieval is a longstanding field, and while semantic search has its advantages—like contextual understanding and ambiguity resolution—it isn't universally applicable. For instance, as of October 2023, a hybrid model combining lexical and semantic approaches leads the BEIR benchmark for diverse information retrieval. If your document set is limited, simpler techniques like cosine similarity may suffice, as they are more straightforward to implement and yield accurate results. Latency concerns often arise from the LLM itself, not the database—AutoGPT, a popular project, has even eliminated vector databases from its architecture. If your LLM agent doesn't require long-term memory in a database, then yours might not, either.
The Misconception of AI Supremacy in Finance
Many believe that since LLMs are so adept at processing vast amounts of text, they can seamlessly replace human roles in finance. However, it's crucial to select the appropriate tool for each specific challenge.
If you prompt ChatGPT to elaborate on portfolio construction theory, you'll receive a comprehensive response, which might lead you to think it could serve as your portfolio manager. While it may perform well on CFA-style questions, one must question whether it has truly internalized the answers. Research suggests that LLMs can propose diversified holdings to mitigate risk, but this overlooks a key point: LLMs do not scale their intelligence in the same manner as humans. They don't learn fundamental arithmetic before tackling calculus; their knowledge is implicit, arising from their training to predict subsequent words or numbers.
This creates a misleading impression—when ChatGPT offers financial advice, one might assume it possesses the ability to execute fundamental financial computations, such as ratio analyses and discounted cash flows. However, this knowledge is implicit and not entirely reliable.
Differentiating Knowledge from Application
The knowledge embedded in LLMs arises from their extensive reading and word prediction tasks. For accurate operation, they must implicitly construct a representation of the world, including numerical manipulation. Due to the tokenization of numbers, general LLMs struggle with basic arithmetic, let alone complex financial analysis.
Without additional tools or API integrations, LLMs are ill-equipped for financial tasks. Investment decisions necessitate the computation of various ratios and comparables, understanding their nuances across industries and timelines, and analyzing them to formulate final decisions. Employing ChatGPT as your financial analyst may not be the most effective approach.
One potential solution is to delegate numerical analysis to specialized tools or systems. Platforms designed specifically for financial calculations can excel in this area. Rather than acting as a direct substitute for analysts, ChatGPT could serve as an orchestrator, aiding in the processing of textual data. To maximize LLM utility, a robust ecosystem of computational tools must be established around them.
Given the unpredictability of LLM outputs and the high stakes of financial decision-making, we cannot assume that machines will replace human analysts. Instead, they should be viewed as enhancements to human expertise.
The Dangers of Over-Reliance on AI
In an age where generative AI raises concerns about industry disruption, the instinctive response is often to invest heavily and "figure it out" later.
Chip Huyen's presentation on Generative AI provides a clear perspective: if obsolescence looms, then going all in may be necessary, particularly in creative fields like writing. In finance, however, a more measured approach is advisable—it's less about scrambling to avoid replacement and more about a strategic build versus buy decision. Many finance firms possess proprietary data, presenting an opportunity to develop models that outperform those of generalist companies. For example, Bloomberg has leveraged its unique data to create a finance-specific LLM.
The Case for Specialized Models
While the allure of replacing existing NLP systems with generalized models like LLMs can be tempting, specialized models often outperform in specific tasks. Models like FinBERT, for instance, excel in fixed outputs, providing consistency that LLMs may lack due to their stochastic nature.
LLMs indeed have a place in innovation, particularly for rapid prototyping. Instead of laboriously gathering training data and fine-tuning models, one can quickly deploy an LLM and obtain satisfactory results in a fraction of the time.
Research indicates their utility in open information extraction and summarization tasks. However, it is premature to discard established methodologies.
Current proprietary models, such as ChatGPT and Claude, excel in their designed tasks and demonstrate a broader generalization capability. Yet, for specialized needs—like sentiment analysis—models such as FinBERT continue to outperform, emphasizing the importance of domain-specific expertise.
The Transition from Demo to Production
The internet is rife with claims of "I built a GPT Investment Banker," with companies showcasing LLMs analyzing complex documents. However, the reliability of outputs remains a significant challenge, especially without substantial architectural advancements.
Unlike traditional systems, LLMs do not guarantee consistent outputs for identical inputs, even at fixed settings. Transitioning from a demonstration to a live application involves implementing safeguards, checks, and verification systems to enhance reliability, but hallucinations remain a challenge. Thus, developing a Q&A bot or chatbot in finance—where precision is paramount—can be far more complex than anticipated.
As previously noted, generic LLMs may not always be the optimal solution for every component of your pipeline. Consequently, optimizing for production may necessitate substituting traditional NLP elements, which can extend the implementation timeline.
Additionally, if you've used GPT-4 for prototyping and now face production costs, the financial implications can be significant. Frequent calls to GPT-4 can accumulate costs quickly, and it may also be slower than ChatGPT, presenting challenges in latency-sensitive applications. Hence, a notable disparity may emerge between your demo and the live version due to budget constraints.
Fortunately, with instruction fine-tuning for ChatGPT, it is possible to refine a model to perform comparably to GPT-4 without incurring excessive costs. Techniques such as grounding and prompt engineering can reduce hallucination risks, but this may compromise user experience in chatbots. Ultimately, integrating LLMs into product design while advising users to verify outputs can mitigate risks, although it may diminish potential economic benefits.
Caveat Scriptor (Writer Beware)
Conclusion
In the fast-paced world of Generative AI and LLMs, it's vital to discern genuine capabilities from marketing hype. While LLMs present exciting possibilities in finance—ranging from prototyping to text manipulation—their limitations in numerical analysis, reliability, and cost-efficiency should not be ignored. Furthermore, specialized models often outperform LLMs in specific tasks, reinforcing the idea that LLMs are not a panacea but rather a component within a more intricate ecosystem. To deploy them effectively in finance—or any sector—requires a nuanced strategy that capitalizes on their strengths while addressing their weaknesses.
The first video, "Navigating Generative AI in L&D: 9 Critical Questions to Ask," delves into essential considerations for leveraging Generative AI in learning and development.
The second video, "Will Generative AI ruin creativity? Life Competencies in the Generative AI era," examines the impact of AI on creativity and essential skills in today's world.
Subscribe to DDIntel here.
Submit your work to DDIntel here.
Join our creator ecosystem here.
DDIntel curates notable pieces from our main site and popular DDI Medium publication. Check us out for more insightful contributions from our community.
Follow us on LinkedIn, Twitter, YouTube, and Facebook.