Publication Date
Fall 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Ching-Seh Wu
Second Advisor
William Andreopoulos
Third Advisor
Saptarshi Sengupta
Keywords
BLEU metric, Deep Learning, Natural Language Processing, Image Captioning, IndicTrans2
Abstract
One of the most prominent tasks that lie on the conjunction of Natural Language Processing (NLP) and computer vision, is image captioning. Image captioning is the generative task of achieving textual descriptions from images. Its application finds use in many real-world scenarios like aiding the visually impaired, editing applications, recommendation systems, and medical imaging. This research focus lies in Hindi image captioning, the official language of India, as it has not been explored as far as its need. Several challenges such as the lack of substantial Hindi text data for training models, the need for human annotators to verify the accuracy of Hindi text, and insufficient training of pre-trained models on Indic languages for translation, can be realized. This project has leveraged the state-of-the-art IndicTrans2 model, a cutting-edge pre-trained translation model, as a new approach to create a novel dataset. This innovative approach marks the first instance of IndicTrans2 being utilized for Hindi image captioning, as opposed to previous works that used Google Translate. With this newly translated Hindi corpus, encoder-decoder architectures integrating convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have been built and trained to effectively learn and generate captions in Hindi that are both accurate and contextually relevant to the corresponding images. The performance of six models in this project have been compared using BLEU metric. Upon analysis of the results, this methodology presents a promising avenue in Hindi image captioning. Compared to the existing state-of-the-art BLEU-1 score of 0.585, this project has achieved a BLEU-1 score of 0.5205 using the newly proposed approach on baseline encoder decoder models, hereby indicating a promising future scope. Index
Recommended Citation
Dinesh, Anahita Vayalombrone, "HINDI IMAGE CAPTIONING USING INDICTRANS2 AND ENCODER-DECODER ARCHITECTURE" (2024). Master's Projects. 1423.
https://scholarworks.sjsu.edu/etd_projects/1423