Enhancing portfolio optimization with multi-LLM sentiment aggregation: A Black-Litterman integration approach
-
DOIhttp://dx.doi.org/10.21511/imfi.22(3).2025.16
-
Article InfoVolume 22 2025, Issue #3, pp. 213-226
- 28 Views
-
3 Downloads
This work is licensed under a
Creative Commons Attribution 4.0 International License
Type of the article: Research Article
Abstract
Sentiment analysis of financial text data plays a crucial role in investment decision-making, yet existing approaches often rely on single-model sentiment scores that may suffer from biases or hallucinations. This study aims to enhance portfolio optimization by integrating sentiment signals from multiple Large Language Models (LLMs) into the Black-Litterman framework. The proposed method aggregates sentiment scores from three finance-domain fine-tuned LLMs using a Long Short-Term Memory network, which captures non-linear relationships and temporal dependencies to produce a robust Meta-LLM sentiment score. This score is then incorporated into the Black-Litterman model as investor views to derive optimal portfolio weights. The methodology is tested on a portfolio of S&P 500 stocks. The results show that the proposed approach significantly improves portfolio performance, achieving an annualized return of 31.22%, compared to 24.57% for the market capital-weighted portfolio. Additionally, the model attains a Sharpe Ratio of 3.02, an Omega Ratio of 2.48, and a Jensen’s Alpha of 1.95%, outperforming both the benchmark portfolios and portfolios based on single-LLM sentiment. The findings demonstrate that aggregating sentiment from multiple LLMs enhances risk-adjusted returns while mitigating model-specific limitations. Future research could explore the integration of LLMs with different architectures to further refine sentiment-aware portfolio strategies.
- Keywords
-
JEL Classification (Paper profile tab)G11
-
References30
-
Tables7
-
Figures2
-
- Figure 1. Transformer model architecture
- Figure 2. LSTM cell and its operations
-
- Table 1. Configured parameters for LSTM sentiment aggregation
- Table 2. Machine learning models’ predictive performance
- Table 3. LLMs’ performance metrics on sentiment analysis
- Table 4. Equity returns descriptive statistics
- Table 5. Portfolio performance comparison
- Table 6. Portfolio performance comparison: Varying τ parameter
- Table 7. Portfolio performance: Meta-LLM vs individual LLMs
-
- Araci, D. (2019). FinBERT: Financial sentiment analysis with pre-trained language models.
- Benjamin, J., & Mathew, J. (2025). Enhancing continuous integration predictions: a hybrid LSTM-GRU deep learning framework with evolved DBSO algorithm. Computing, 107(1), 9.
- Black, F., & Litterman, R. (1990). Asset allocation: Combining investor views with market equilibrium. Journal of Fixed Income, 1, 7-18.
- Bukhari, A.H., Raja, M.A.Z., Sulaiman, M., Islam, S., Shoaib, M., & Kumam, P. (2020). Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting. IEEE Access, 8, 71326-71338.
- Colasanto, F., Grilli, L., Santoro, D., & Villani, G. (2022). BERT’s sentiment score for portfolio optimisation: a fine-tuned view in Black and Litterman model. Neural Computing and Applications, 34(20), 17507-17521.
- de Kok, S., Punt, L., van den Puttelaar, R., Ranta, K., Schouten, K., & Frasincar, F. (2018). Review-Aggregated aspect-based sentiment analysis with ontology features. Progress in Artificial Intelligence, 7, 295-306.
- Dmonte, A., Ko, E., & Zampieri, M. (2024, December). An Evaluation of Large Language Models in Financial Sentiment Analysis. In 2024 IEEE International Conference on Big Data (BigData) (pp. 4869-4874). IEEE.
- Dong, Z., Fan, X., & Peng, Z. (2024, August). Fnspid: A comprehensive financial news dataset in time series. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 4918-4927).
- Huang, D., Huang, K., Li, Z., Liu, Z., & Zhao, J. (2021, January). FinBERT: A pre-trained financial language representation model for financial text mining. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (pp. 45134519).
- Kang, H., & Liu, X. Y. (2023). Deficiency of large language models in finance: An empirical examination of hallucination.
- Kirtac, K., & Germano, G. (2025). Leveraging LLM-based sentiment analysis for portfolio allocation with proximal policy optimization. In ICLR 2025 Workshop on Machine Learning Multiscale Processes.
- Kuruvilla, J. S., & Mythily, M. (2025). Financial LLM For Stock Price Analysis And Investment Recommendation. Journal of Telematics and Informatics, 13(1).
- Lefort, B., Benhamou, E., Ohana, J. J., Saltiel, D., & Guez, B. (2024). Optimizing Performance: How Compact Models Match or Exceed GPT’s Classification Capabilities through Fine-Tuning.
- Liu, B. (2012). Sentiment analysis and opinion mining. Springer Nature.
- Lou, R., Zhang, K., & Yin, W. (2024). Large language model instruction following: A survey of progresses and challenges. Computational Linguistics, 50(3), 1053-1095.
- Mao, D., Zhang, D., Zhang, A., & Zhao, Z. (2025, April). MLSDET: Multi-LLM Statistical Deep Ensemble for Chinese AI-Generated Text Detection. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
- Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
- Pathuri, S. K., Anbazhagan, N., & Prakash, G. B. (2020, July). Feature based sentimental analysis for prediction of mobile reviews using hybrid bag-boost algorithm. In 2020 7th International Conference on Smart Structures and Systems (ICSSS) (pp. 1-5). IEEE.
- Ranjan, R., Gupta, S., & Singh, S. N. (2024). A comprehensive survey of bias in LLMs: Current landscape and future directions.
- Romero, M. (2024). Distil Roberta.
- Shah, N., Genc, Z., & Araci, D. (2024). StackEval: Benchmarking LLMs in Coding.
- Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune BERT for text classification? In Chinese computational linguistics. 18th China National Conference, CCL 2019. Kunming, China (pp. 194-206). Springer International Publishing.
- Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., & Rodriguez, A. (2023). LLaMA: Open and efficient foundation language models.
- Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
- Wang, J. H., Liu, T.W., Luo, X., & Wang, L. (2018, October). An LSTM approach to short text sentiment classification with word embeddings. In Proceedings of the 30th conference on computational linguistics and speech processing (ROCLING 2018) (pp. 214-223).
- Xie, Q., Han, W., Chen, Z., Xiang, R., Zhang, X., He, Y., Xiao, M., Li, D., Dai, Y., Feng, D., & Xu, Y. (2024). Finben: A holistic financial benchmark for large language models. Advances in Neural Information Processing Systems, 37, 95716-95743.
- Zhang, B., Yang, H., & Liu, X.Y. (2023). Instruct-FinGPT: Financial sentiment analysis by instruction tuning of general-purpose large language models.
- Zhang, H., & Shafiq, M.O. (2024). Survey of transformers and towards ensemble learning using transformers for natural language processing. Journal of Big Data, 11(1), 25.
- Zhang, W.E., Sheng, Q.Z., Alhazmi, A., & Li, C. (2020). Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3), 1-41.
- Zhao, H., Liu, Z., Wu, Z., Li, Y., Y., T., Shu, P., Xu, S., Dai, H., Zhao, L., Mai, G., & Liu, N. (2024). Revolutionizing finance with LLMs: An overview of applications and insights.