Publication Date

Fall 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Ching-Seh Wu

Second Advisor

William Andreopoulos

Third Advisor

Saptarshi Sengupta

Keywords

BLEU metric, Deep Learning, Natural Language Processing, Image Captioning, IndicTrans2

Abstract

One of the most prominent tasks that lie on the conjunction of Natural Language Processing (NLP) and computer vision, is image captioning. Image captioning is the generative task of achieving textual descriptions from images. Its application finds use in many real-world scenarios like aiding the visually impaired, editing applications, recommendation systems, and medical imaging. This research focus lies in Hindi image captioning, the official language of India, as it has not been explored as far as its need. Several challenges such as the lack of substantial Hindi text data for training models, the need for human annotators to verify the accuracy of Hindi text, and insufficient training of pre-trained models on Indic languages for translation, can be realized. This project has leveraged the state-of-the-art IndicTrans2 model, a cutting-edge pre-trained translation model, as a new approach to create a novel dataset. This innovative approach marks the first instance of IndicTrans2 being utilized for Hindi image captioning, as opposed to previous works that used Google Translate. With this newly translated Hindi corpus, encoder-decoder architectures integrating convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have been built and trained to effectively learn and generate captions in Hindi that are both accurate and contextually relevant to the corresponding images. The performance of six models in this project have been compared using BLEU metric. Upon analysis of the results, this methodology presents a promising avenue in Hindi image captioning. Compared to the existing state-of-the-art BLEU-1 score of 0.585, this project has achieved a BLEU-1 score of 0.5205 using the newly proposed approach on baseline encoder decoder models, hereby indicating a promising future scope. Index

Available for download on Friday, December 05, 2025

Share

COinS