Emergent Mind

Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow

(2407.18103)
Published Jul 25, 2024 in q-fin.CP , cs.LG , and q-fin.PM

Abstract

LLMs and their fine-tuning techniques have demonstrated superior performance in various language understanding and generation tasks. This paper explores fine-tuning LLMs for stock return forecasting with financial newsflow. In quantitative investing, return forecasting is fundamental for subsequent tasks like stock picking, portfolio optimization, etc. We formulate the model to include text representation and forecasting modules. We propose to compare the encoder-only and decoder-only LLMs, considering they generate text representations in distinct ways. The impact of these different representations on forecasting performance remains an open question. Meanwhile, we compare two simple methods of integrating LLMs' token-level representations into the forecasting module. The experiments on real news and investment universes reveal that: (1) aggregated representations from LLMs' token-level embeddings generally produce return predictions that enhance the performance of long-only and long-short portfolios; (2) in the relatively large investment universe, the decoder LLMs-based prediction model leads to stronger portfolios, whereas in the small universes, there are no consistent winners. Among the three LLMs studied (DeBERTa, Mistral, Llama), Mistral performs more robustly across different universes; (3) return predictions derived from LLMs' text representations are a strong signal for portfolio construction, outperforming conventional sentiment scores.

Different workflows for using financial news in stock picking: conventional methods vs. LLM fine-tuning.

Overview

  • The paper investigates the use of fine-tuned LLMs for predicting stock returns based on financial news data, focusing on different text representation methods and LLM architectures.

  • The study utilizes financial news data from 2003 to 2019 and examines the performance of encoder-only (DeBERTa) and decoder-only LLMs (Mistral, Llama3) in generating return predictions for various investment markets.

  • Key findings indicate that aggregated token-level representations outperform bottleneck representations and that decoder-only models like Mistral are particularly robust, providing stronger portfolio signals compared to sentiment-based approaches.

Fine-Tuning LLMs for Stock Return Prediction Using Newsflow

This paper by Guo and Hauptmann explores the application of fine-tuning LLMs for the task of stock return forecasting utilizing financial newsflow. The research is pivotal for quantitative investing, where accurate return predictions are essential for effective stock selection and portfolio optimization.

Methodology and Contributions

The main contributions of this paper are:

  1. An LLM-based return prediction model that integrates text representation and forecasting modules.
  2. A comparative analysis of encoder-only (DeBERTa) and decoder-only LLMs (Mistral, Llama3) to evaluate their efficacy in generating textual representations for return forecasting.
  3. The introduction of two methods to integrate token-level representations into the forecasting module: bottleneck representations and aggregated representations.

The study aims to determine the impact of different text representations on the performance of stock return forecasts. This is achieved by fine-tuning LLMs on financial news data and assessing their ability to predict future stock returns in various investment universes.

Experimental Setup

The experiments are conducted on financial news data spanning from 2003 to 2019, with distinct investment universes for the North American (NA), European (EU), and Emerging (EM) markets. The models are trained on data from 2003 to 2014 and tested on data from 2015 to 2019. The paper compares the performance of different models based on decile RMSE, decile precision, and decile return. Additionally, the performance metrics of portfolios constructed using these predictions are analyzed against sentiment-based portfolios.

Numerical Results and Findings

Key findings from the experiments are:

  1. Aggregated Representations: Aggregated representations from LLMs' token-level embeddings generally outperform bottleneck representations across various metrics. For the North American universe, the aggregated representation model consistently showed higher decile returns and precision in the critical top 9th decile, which benefits long-only portfolios. The same trend was observed for European and Emerging market universes.
  2. Encoder-only vs. Decoder-only LLMs: The study reveals that decoder-only LLMs, particularly Mistral, yield robust performance across different investment universes. In large universes, decoder-based LLMs like Mistral and Llama3 showed stronger portfolio performance. Encoder-only models like DeBERTa demonstrated competitive performance but were outperformed by Mistral in generating return predictions beneficial for portfolio construction.
  3. Prediction-based vs. Sentiment-based Portfolios: Return predictions derived from LLMs' text representations resulted in stronger portfolio signals compared to conventional sentiment scores. Prediction-based portfolios outperformed sentiment-based portfolios in terms of annual returns and Sharpe ratios in all examined investment universes.

Practical and Theoretical Implications

The findings have significant implications for the field of quantitative investing. The demonstrated superiority of aggregated representations suggests that retaining rich token-level information is crucial for accurate financial predictions. Additionally, the robust performance of decoder-only LLMs like Mistral across varying market conditions and investment universes highlights their potential for broader applications in financial forecasting.

Theoretical implications suggest exploring the underlying reasons for the varying performance of model architectures. Future research might focus on investigating the representation collapse issue associated with different LLM pre-training tasks and further evaluating newly proposed large encoder-only LLMs to understand their efficacy in financial contexts.

Conclusion

This paper provides a comprehensive analysis of leveraging fine-tuned LLMs for predicting stock returns using financial newsflow. By comparing different LLM architectures and representation methods, it demonstrates the potential of aggregated token-level representations in enhancing portfolio performance. These insights pave the way for future developments in integrating advanced NLP techniques into quantitative financial models, thus offering new avenues for research and application in AI-driven finance.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.