--

15(2)2025 (IN PRESS)

An unsupervised approach for sentiment analysis via financial texts


Author - Affiliation:
Cong Chi Pham - Ho Chi Minh City Open University, Ho Chi Minh City , Vietnam
Bay Van Nguyen - Ho Chi Minh City Open University, Ho Chi Minh City , Vietnam
Huy Quoc Nguyen - Ho Chi Minh City Open University, Ho Chi Minh City , Vietnam
Corresponding author: Huy Quoc Nguyen - huy.nq@ou.edu.vn
Submitted: 23-08-2024
Accepted: 17-10-2024
Published: 13-01-2025

Abstract
The rapidly increasing volume of textual data has made manual labeling extremely costly and time-consuming. To address this limitation, researchers have gradually focused on unsupervised learning techniques that enable models to classify text without relying on labeled data. Among these, deep clustering has garnered significant interest. However, most existing deep clustering methods are primarily designed for computer vision tasks. In this paper, we propose modifications to two of the most powerful deep clustering methods, including DEKM and DeepCluster, by integrating transformer algorithms in the Natural Language Processing (NLP) domain, enabling these methods to handle textual data. With the proposed methods, we achieved the best results on the test set of the Financial Phrase Bank (FPB) dataset with an accuracy of 57.71% and on the test set of the Twitter Financial News (TFN) dataset with an accuracy of 65.58%. Although these results are still lower than those of traditional supervised deep learning methods, we have demonstrated that the performance of our proposed methods can be further improved when trained with more data. This highlights the promising potential of deep clustering methods for natural language processing tasks. Especially when addressing tasks where the data is either unlabeled or lacks sufficient labeling.

Keywords
autoencoder; deep clustering; natural language processing; transformer; unsupervised sentiment analysis

Full Text:
PDF

References

Guo, W., Lin, K., & Ye, W. (2021, December). Deep embedded K-means clustering. In 2021 International Conference on Data Mining Workshops (ICDMW) (pp. 686-694). IEEE.


Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.


Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.


Csanády, B., Muzsai, L., Vedres, P., Nádasdy, Z., & Lukács, A. (2024). LlamBERT: Large-scale low-cost data annotation in NLP. arXiv preprint arXiv:2403.15938.


Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.


Xie, J., Girshick, R., & Farhadi, A. (2016, June). Unsupervised deep embedding for clustering analysis. In International conference on machine learning (pp. 478-487). PMLR.


Guo, X., Liu, X., Zhu, E., & Yin, J. (2017). Deep clustering with convolutional autoencoders. In Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings, Part II 24 (pp. 373-382). Springer International Publishing.


Guo, X., Gao, L., Liu, X., & Yin, J. (2017, August). Improved deep embedded clustering with local structure preservation. In Ijcai (Vol. 17, pp. 1753-1759).


Yang, B., Fu, X., Sidiropoulos, N. D., & Hong, M. (2017, July). Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In international conference on machine learning (pp. 3861-3870). PMLR.


Fard, M. M., Thonet, T., & Gaussier, E. (2020). Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters, 138, 185-192.


Choe, J., Noh, K., Kim, N., Ahn, S., & Jung, W. (2023). Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models. arXiv preprint arXiv:2310.13312.


Caron, M., Bojanowski, P., Joulin, A., & Douze, M. (2018). Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV) (pp. 132-149).


Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.


Beyer, L., Zhai, X., & Kolesnikov, A. (2022). Better plain vit baselines for imagenet-1k. arXiv preprint arXiv:2205.01580.


Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33, 9912-9924.


Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013, October). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).


Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.


Ren, Y., Pu, J., Yang, Z., Xu, J., Li, G., Pu, X., ... & He, L. (2024). Deep clustering: A comprehensive survey. IEEE Transactions on Neural Networks and Learning Systems.


Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.



Creative Commons License
© The Author(s) 2025. This is an open access publication under CC BY NC licence.