Author

Zixiao Fan

Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Faranak Abri

Second Advisor

Sayma Akther

Third Advisor

Zhen Fan

Keywords

Image Captioning, Deep Learning, Neural Network, Transformer

Abstract

Image captioning, which provides a textual understanding of visual content, is the fundamental support for the advancement of Human-A.I. Interaction technology. In the hope of exploring the application of such technology, this project focuses on two specific goals. One is to directly explore the application of the image informationretrieving abilities, and the other is to dive into the specifics of the pipeline and components of image captioning models. As a result, this project presents a working app that exploits the text retrieval functionalities to enable image storage with functions like tagging and transcription. It also supports search functionality with a clear usage flow and model deployment which facilitates data security. This project also includes exploration of two trained models. One is based on Resnet50 and LSTM with adaptive attention mechanism and the other is a Qformer mid layer adapter with pretrained frozen ViT and GPT2.

Available for download on Monday, May 25, 2026

Share

COinS