(Credit: Hamara/Shutterstock)
PULLMAN, Wash. — As artificial intelligence continues to make headlines, one pressing question looms: Could AI chatbots like ChatGPT assist or potentially replace financial professionals? A new study by Washington State University and Clemson University researchers, analyzing more than 10,000 AI responses to financial exam questions, provides some sobering answers.
“It’s far too early to be worried about ChatGPT taking finance jobs completely,” says study author DJ Fairhurst of WSU’s Carson College of Business in a statement. “For broad concepts where there have been good explanations on the internet for a long time, ChatGPT can do a very good job at synthesizing those concepts. If it’s a specific, idiosyncratic issue, it’s really going to struggle.”
The research, published in the Financial Analysts Journal, addresses a significant industry concern. Goldman Sachs estimates that 15% to 35% of finance jobs could potentially be automated by AI, while KPMG suggests that generative AI may revolutionize how asset and wealth managers operate. However, these projections rely on a critical assumption – that AI systems possess an adequate understanding of finance.
“Passing certification exams is not enough. We really need to dig deeper to get to what these models can really do,” notes Fairhurst.
The researchers assembled a comprehensive dataset of 1,083 multiple-choice questions drawn from various financial licensing exams, including the Securities Industry Essentials (SIE) exam and Series 7, 6, 65, and 66 exams. These are the same tests that human financial professionals must pass to become licensed. Currently, about 42,000 people become registered representatives annually, with more than 600,000 working in the securities industry.
Using this question bank, the study tested four different AI models: Google’s Bard, Meta’s LLaMA, and two versions of OpenAI’s ChatGPT (versions 3.5 and 4). The researchers evaluated not just answer accuracy but also used sophisticated natural language processing techniques to compare how well the AI systems could explain their reasoning compared to expert-written explanations.
The results revealed distinct tradeoffs among the AI models. Of all the models tested, ChatGPT 4 emerged as the clear leader, with accuracy rates 18 to 28 percentage points higher than other models. However, an interesting development emerged when researchers fine-tuned the earlier free version of ChatGPT 3.5 by feeding it examples of correct responses and explanations. After this tuning, it nearly matched ChatGPT 4’s accuracy and even surpassed it in providing answers that resembled those of human professionals.
Both models still showed significant limitations. While they performed well on questions about trading, customer accounts, and prohibited activities (73.4% accuracy), performance dropped to 56.6% on questions about evaluating client financial profiles and investment objectives. The models gave more inaccurate answers for specialized situations, such as determining clients’ insurance coverage and tax status.
The research team isn’t stopping with exam questions. They’re now exploring other ways to test ChatGPT’s capabilities, including a project that asks it to evaluate potential merger deals. Taking advantage of ChatGPT’s initial training cutoff date of September 2021, they’re testing it against known outcomes of deals made after that date. Preliminary findings suggest the AI model struggles with this more complex task.
These limitations have important implications for the finance industry, particularly regarding entry-level positions.
“The practice of bringing a bunch of people on as junior analysts, letting them compete and keeping the winners – that becomes a lot more costly,” explains Fairhurst. “So it may mean a downturn in those types of jobs, but it’s not because ChatGPT is better than the analysts, it’s because we’ve been asking junior analysts to do tasks that are more menial.”
Based on these findings, AI’s immediate future in finance appears to be collaborative rather than replacive. While these systems demonstrate impressive capabilities in summarizing information and handling routine analytical tasks, their error rates – particularly in complex, client-facing situations – indicate that human oversight remains essential in an industry where mistakes can have serious financial and legal consequences.
Paper Summary
Methodology
The researchers analyzed over 10,000 responses from four different AI models (Bard, LLaMA, ChatGPT 3.5, and ChatGPT 4) to 1,083 financial licensing exam questions. Each question was tested across multiple models and configurations, creating a comprehensive dataset. The team evaluated two key aspects: whether the AI picked the correct answer and how well it explained its reasoning compared to expert explanations. They used sophisticated natural language processing techniques (specifically the BERT model) to measure how closely AI explanations matched expert-written ones.
Additionally, they mapped the questions to 51 real-world finance job tasks using data from the U.S. Department of Labor’s Occupational Information Network (O*NET) to understand practical applications. The study also explored different ways of using AI systems, including web interfaces, API access with various settings, and specially trained (fine-tuned) models.
Key Results
ChatGPT 4 emerged as the top performer, correctly answering 84.5% of questions – a significant 18-28 percentage points better than free models. When researchers fine-tuned ChatGPT 3.5 by training it on specific financial content, it nearly matched ChatGPT 4’s accuracy and even surpassed it in explanation quality. The AIs performed best on questions about trading and market operations (73.4% accuracy) but struggled with client-specific tasks like financial planning and tax analysis (dropping to 56.6% accuracy). Interestingly, both AI and human test-takers tended to struggle with the same challenging questions, suggesting fundamental limitations in handling complex financial concepts.
Study Limitations
The study primarily used entry-level licensing exam questions, which may not fully capture the complexity of real-world financial work. Some test questions were available online, potentially inflating AI performance by up to 13% for these questions. The research was conducted in late 2023 and early 2024, and given the rapid pace of AI development, results might change with newer versions. Additionally, exam questions don’t test important aspects of finance jobs, such as writing, communication, and creative thinking skills.
Discussion & Takeaways
The research suggests AI is currently better suited as an assistant than a replacement for financial professionals. While it shows promise in tasks like market monitoring and basic analysis, it remains less reliable for complex, client-specific work. The study reveals important tradeoffs between different AI models and implementation methods. Fine-tuning can significantly improve performance, but even the most advanced models still make errors that could be costly in real-world applications. The findings also suggest potential changes in entry-level finance jobs, particularly for junior analysts performing routine tasks.
Funding & Disclosures
The research was supported by data from Achievable and Knopman Marks, two financial exam preparation companies. Special acknowledgments were given to Justin Pincar at Achievable and Brian Marks at Knopman Marks. The study also benefited from input from seminar participants at Washington State University and Clemson University. The authors reported no conflicts of interest, and the study received peer review before publication in the Financial Analysts Journal.
Publication Details
This study was published in the Financial Analysts Journal on November 18, 2024. The article titled “How Much Does ChatGPT Know about Finance?” can be accessed using the Digital Object Identifier (DOI): 10.1080/0015198X.2024.2411941. The research was authored by Douglas (DJ) Fairhurst, an associate professor of finance at Carson College of Business, Washington State University, and Daniel Greene, the Bill Short Associate Professor of Finance at Wilbur O. and Ann Powers College of Business, Clemson University. The article earned 2.0 PL Credits and underwent peer review before publication. Correspondence regarding the study can be directed to Douglas (DJ) Fairhurst at [email protected].