Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Sayma Akther

Second Advisor

Nada Attar

Third Advisor

William Andreopoulos

Keywords

Deep Learning, Convolutional Neural Networks, Machine Learning.

Abstract

Lip-reading, a ubiquitous field between computer vision and speech processing, focuses on identifying what spoken words a person generates depending on their uttering lip movements. This paper presents a streamlined lip-reading solution that employs machine learning and deep learning. First, Our work utilizes the Multi-Task Cascaded Convo- lutional Networks to detect facial “landmarks,” including the face and lips region, and the aligns the face. The aligned faces are segmented to get the lip images. Lip images are preprocessed using the Real-Enhanced Super Resolution Generative Adversarial Network to enhance image resolution to identify subtle lip movement in video images: a critical aspect of lip-reading. Once lip images have been preprocessed, it is fed into the architecture based on CNN from which the features could be learned. The feature extraction and lip movement are learned through 3D convolutional network utilizing time distributed layer with LSTM in either direction. We use a text corpus dataset known as the GRID and train our model for obtaining 2.3% Character error rate on seen speakers and 5.2% on unseen.

Recommended Citation

Ambati, Srujith Rao, "Deciphering Speech through Vision: A Deep Learning Lip Reading System" (2024). Master's Projects. 1407.
DOI: https://doi.org/10.31979/etd.sfst-j66z
https://scholarworks.sjsu.edu/etd_projects/1407

Download

Available for download on Sunday, May 25, 2025

Included in

Other Computer Engineering Commons

COinS

DOI

https://doi.org/10.31979/etd.sfst-j66z

Master's Projects

Deciphering Speech through Vision: A Deep Learning Lip Reading System

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Deciphering Speech through Vision: A Deep Learning Lip Reading System

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links