Publication Date

Fall 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

Kong Li


Image Localization, CNNs, RNNs, Neural Nets, Captioning, OCR


Image localization corresponds to translating the text present in the images from one language to other language. The aim of the project is to develop a methodology to translate the text in image captions from English to Hindi by taking context of the images into account. A lot of work has been done in this field [22], but our aim was to explore if the accuracy can be further improved by consideration of the additional information imparted by the images apart from the text. We have explored Deep Learning using neural networks for this project. In particular, Recurrent Neural Networks (RNN) have been used which are ideal for sequence translations and would meet the needs of this project which involves text sequences. This technique of image localization would be beneficial in a lot of fields. For example, in order to make the text data accessible to everyone, text data should be translated in multiple languages spoken by people across the world. This will help in the growth at the rural areas and countries where English is not spoken by giving them access to data in their local languages. This could also benefit tourists who would then be able to understand the sign boards and posters in a foreign country. With accurate data translation, the old manuscripts can also be translated to English upon which further research can be carried out.