Khmer printed character recognition using attention-based Seq2Seq network

Authors

  • Rina Buoy
    Techo Startup Center, Phnom Penh, Cambodia, KH
  • Nguonly Taing
    Techo Startup Center, Phnom Penh, Cambodia, KH
  • Sovisal Chenda
    Techo Startup Center, Phnom Penh, Cambodia, KH
  • Sokchea Kor
    Royal University of Phnom Penh, Phnom Penh, Cambodia, KH

DOI:

10.46223/HCMCOUJS.tech.en.12.1.2217.2022

Keywords:

Khmer; Optical Character Recognition; Deep Learning; Neural Network

Abstract

This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.

Downloads

Download data is not yet available.

References

Annanurov, B., & Noor, N. M. (2018). Khmer handwritten text recognition with convolution neural networks. ARPN Journal of Engineering and Applied Sciences, 13(22), 8828-8833.

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Retrieved October 10, 2021, from https://arxiv.org/pdf/1409.0473.pdf

Buoy, R., Taing, N., & Kor, S. (2020). Khmer word segmentation using BiLSTM networks. Paper presented at the 4th Regional Conference on OCR and NLP for ASEAN Languages (ONA 2020), Phnom Penh, Cambodia.

Buoy, R., Taing, N., & Kor, S. (2021). Joint Khmer word segmentation and part-of-speech tagging using deep learning. Retrieved October 10, 2021, from https://arxiv.org/ftp/arxiv/ papers/2103/2103.16801.pdf

Chey, C., Kumhom, P., & Chamnongthai, K. (2005). Khmer printed character recognition by using wavelet descriptors. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 14(3), 337-350.

Downloads

Received: 28-03-2022
Accepted: 18-04-2022
Published: 20-04-2022

Statistics Views

Abstract: 912
PDF: 872

How to Cite

Buoy, R., Taing, N., Chenda, S., & Kor, S. (2022). Khmer printed character recognition using attention-based Seq2Seq network. HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ENGINEERING AND TECHNOLOGY, 12(1), 3–16. https://doi.org/10.46223/HCMCOUJS.tech.en.12.1.2217.2022