An unsupervised approach for sentiment analysis via financial texts

Authors

  • Cong Chi Pham
    Ho Chi Minh City Open University, Ho Chi Minh City, VN
  • Bay Van Nguyen
    Ho Chi Minh City Open University, Ho Chi Minh City, VN
  • Huy Quoc Nguyen
    Ho Chi Minh City Open University, Ho Chi Minh City, VN

DOI:

10.46223/HCMCOUJS.tech.en.15.2.3684.2025

Keywords:

autoencoder; deep clustering; natural language processing; transformer; unsupervised sentiment analysis

Abstract

The rapidly increasing volume of textual data has made manual labeling extremely costly and time-consuming. To address this limitation, researchers have gradually focused on unsupervised learning techniques that enable models to classify text without relying on labeled data. Among these, deep clustering has garnered significant interest. However, most existing deep clustering methods are primarily designed for computer vision tasks. In this paper, we propose modifications to two of the most powerful deep clustering methods, including DEKM and DeepCluster, by integrating transformer algorithms in the Natural Language Processing (NLP) domain, enabling these methods to handle textual data. With the proposed methods, we achieved the best results on the test set of the Financial Phrase Bank (FPB) dataset with an accuracy of 57.71% and on the test set of the Twitter Financial News (TFN) dataset with an accuracy of 65.58%. Although these results are still lower than those of traditional supervised deep learning methods, we have demonstrated that the performance of our proposed methods can be further improved when trained with more data. This highlights the promising potential of deep clustering methods for natural language processing tasks. Especially when addressing tasks where the data is either unlabeled or lacks sufficient labeling.

Downloads

Download data is not yet available.

References

Araci, D. (2019). FinBERT: Financial sentiment analysis with pre-trained language modelshttps://doi.org/10.48550/arxiv.1908.10063

Beyer, L., Zhai, X., & Kolesnikov, A. (2022). Better plain ViT baselines for ImageNet-1k. https://doi.org/10.48550/arxiv.2205.01580

Caron, M., Bojanowski, P., Joulin, A., & Douze, M. (2018). Deep clustering for unsupervised learning of visual features. In Lecture notes in computer science (pp. 139-156). Springer. https://doi.org/10.1007/978-3-030-01264-9_9

Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. https://doi.org/10.48550/arxiv.2006.09882

Choe, J., Noh, K., Kim, N., Ahn, S., & Jung, W. (2023). Exploring the impact of corpus diversity on financial pretrained language models. https://doi.org/10.48550/arxiv.2310.13312

Downloads

Received: 23-08-2024
Accepted: 17-10-2024
Published: 13-01-2025

Statistics Views

Abstract: 244
PDF: 210

How to Cite

Pham, C. C., Nguyen, B. V., & Nguyen, H. Q. (2025). An unsupervised approach for sentiment analysis via financial texts. HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ENGINEERING AND TECHNOLOGY, 15(2), 46–54. https://doi.org/10.46223/HCMCOUJS.tech.en.15.2.3684.2025