--

14(1)2024

Detecting spelling errors in Vietnamese administrative document using large language models


Author - Affiliation:
Huan The Phung - Thai Nguyen University of Information and Communication Technology, Thai Nguyen , Vietnam
Nghia Van Luong - Pham Van Dong University, Quang Ngai , Vietnam
Corresponding author: Huan The Phung - pthuan@ictu.edu.vn
Submitted: 22-12-2023
Accepted: 02-02-2024
Published: 05-03-2024

Abstract
In the context of the emergence of more and more administrative documents, the need to ensure accuracy and improve the quality of these documents becomes increasingly important. This research focuses on applying advanced language models to detect spelling errors in administrative documents. Specifically, in this study, a new method using a language model based on the Transformers architecture is proposed to automatically detect and correct common spelling errors in administrative documents. This method combines the model’s ability to understand context and grammar to identify words or phrases that are likely to be misspelled. The proposed method is tested on a dataset containing real administrative documents, and the experimental results show that the proposed model is capable of detecting spelling errors with significant performance, helping to improve accuracy. and improve the quality of administrative documents. This research not only contributes to improving the quality of administrative documents but also opens up new research directions in applying language models to issues related to natural language processing in the field of administration.

Keywords
administrative documents; detect spelling errors; language model; natural language processing

Full Text:
PDF

Cite this paper as:

Phung, H. T., & Luong, N. V. (2024). Detecting spelling errors in Vietnamese administrative document using large language models. Ho Chi Minh City Open University Journal of Science – Engineering and Technology, 14(1), 31-40. doi:10.46223/HCMCOUJS.tech.en.14.1.3141.2024


References

Anastasopoulos, L. J., & Whitford, A B. (2019). Machine learning for public administration research, with application to organizational reputation. Journal of Public Administration Research and Theory, 29(3), 491-510.


Ashish, V., Noam, S., Niki, P., Jakob, U., Llion, J., Aidan, N. G., … Illia, P. (2017). Attention is all you need. Paper presented at the Conference of Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.


Chính phủ. (2020). Nghị định số 138/2020/NĐ-CP ngày 27 tháng 11 năm 2020 về Quy định về tuyển dụng, sử dụng và quản lý công chức [Decree No. 138/2020/ND-CP dated November 27, 2020 on Regulations on recruitment, employment and public management Decree No. 138/2020/ND-CP dated November 27, 2020 on Regulations on recruitment employ, employ and manage civil servants]. Retrieved October 10, 2022, from https://thuvienphapluat.vn/van-ban/Bo-may-hanh-chinh/Nghi-dinh-138-2020-ND-CP-tuyen-dung-su-dung-va-quan-ly-cong-chuc-458542.aspx


Cơ sở dữ liệu quốc gia về văn bản pháp luật. (n.d.). Retrieved October 10, 2022, from https://vbpl.vn


Daniel, W. O., Julian, R. M., & Jugal, K. (2020). A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32(2), 604-624.


Devlin, J., Chang, M.-W., Lee, K., & Tautanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Paper presented at the Annual Conference of the North American Chapter of the Association for Computational Linguistics 2019, Minneapolis, USA.


Hoang, P. (2009). Vietnamese dictionary. Danang, Vietnam: Da Nang Books Publishing Joint Stock Company.


Nguyen, D. Q., Le, A. D., & Zelinka, I. (2019). OCR Error Correction for Unconstrained Vietnamese Handwritten Text. Paper presented at the Tenth International Symposium on Information and Communication Technology 2019, Quangninh, Vietnam.


Nguyen, H. T. X., Dang, T. T., Nguyen, T. T., & Le, C. A. (2015). Using large n-gram for Vietnamese spell checking. Paper presented at the Knowledge and Systems Engineering: Proceedings of the Sixth International Conference KSE 2014, Hanoi, Vietnam.


Nguyen, H. V., Nguyen, T. H., & Snasel, V. (2015). Normalization of Vietnamese tweets on twitter. Paper presented at the Intelligent Data Analysis and Applications: Second Euro-China Conference on Intelligent Data Analysis and Applications, ECC 2015, Ostrava, Czech Republic.


Nguyen, L. T., & Nguyen, V. D. (2021). New points in the regulations on clerical work under Decree No. 30/2020/NĐ-CP and implementation solutions. VNUHCM Journal of Social Sciences and Humanities, 5(3), 1114-1121.


Nguyen, P. H., Ngo, T. D., Phan, D. A., Dinh, T. P. T., & Huynh, T. Q. (2008). Vietnamese spelling detection and correction using Bi-gram, Minimum Edit Distance, SoundEx algorithms with some additional heuristics. Paper presented at the 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, Hanoi, Vietnam.


Rivera-Acosta, M., Ruiz-Varela, J. M., Ortega-Cisneros, S., Rivera, J., Parra-Michel, R., & Mejia-Alvarez, P. (2021). Spelling correction real-time american sign language alphabet translation system based on yolo network and LSTM. Electronics, 10(9), Article 1035.


Tran, H., Dinh, C. V., Phan, L., & Nguyen, T. S. (2021). Hierarchical transformer encoders for Vietnamese spelling correction. Paper presented at the Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, Kuala Lumpur, Malaysia.


Uỷ ban nhân dân tỉnh Bạc Liêu. (2023). Quyết định số 1268/QĐ-UBND ngày 06 tháng 07 năm 2023 về việc thông qua phương án kiến nghị đơn giản hóa thủ tục hành chính thuộc phạm vi quản lý, thẩm quyền giải quyết của tỉnh bạc liêu (Lĩnh vực: lưu thông hàng hóa trong nước và hóa chất) [Decision No. 1268/QD-UBND dated July 6, 2023 on approving the proposed plan to simplify administrative procedures within the scope of management and jurisdiction of the province (Field: record domestic chemical cargo and chemicals)]. Retrieved October 10, 2022, from https://thuvienphapluat.vn/van-ban/Thuong-mai/Quyet-dinh-1268-QD-UBND-2023-don-gian-hoa-thu-tuc-hanh-chinh-Luu-thong-hang-hoa-Bac-Lieu-574642.aspx



Creative Commons License
© The Author(s) 2024. This is an open access publication under CC BY NC licence.