Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

William Andreopoulos

Second Advisor

Fabio Di Troia

Third Advisor

Nada Attar

Keywords

Image Captioning, RL (Reinforcement Learning), CNN (Convolutional Neural Network), RNN (Recurrent Neural Network)

Abstract

Image captioning is a crucial technology with numerous applications, including enhancing accessibility for the visually impaired, developing automated image indexing and retrieval systems, and enriching social media experiences. However, accurately describing the content of an image in natural language remains a challenge, particularly in low-resource settings where data and computational power are limited. The most advanced image captioning architectures currently use encoder-decoder structures that incorporate a sequential recurrent prediction model. This study adopts a typical Convolutional Neural Network (CNN) encoder Recurrent Neural Network (RNN) decoder structure for image captioning, but it has framed the problem as a sequential decision-making task. The image captioning models in this research used reinforcement learning (RL) as a means of training to improve performance. The study uses a policy network to anticipate the following word in a caption based on earlier predicted words and a value network to assess the entire caption and its possible variations. Both these networks have been trained using a reinforcement learning model that relies on visual-semantic embeddings. This method outperforms the standard encoder-decoder framework even with minimal training on a smaller subset of the Microsoft COCO dataset.

Recommended Citation

Golamaru, Venkat Teja, "Image Captioning using Reinforcement Learning" (2023). Master's Projects. 1225.
DOI: https://doi.org/10.31979/etd.6yj7-7qpf
https://scholarworks.sjsu.edu/etd_projects/1225

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

DOI

https://doi.org/10.31979/etd.6yj7-7qpf

Master's Projects

Image Captioning using Reinforcement Learning

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Image Captioning using Reinforcement Learning

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links