Author

Xiaoqian Yang

Publication Date

Fall 2023

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Chris Tseng

Second Advisor

Faranak Abri

Third Advisor

Hang Zhang

Keywords

Sign Language Translation, Video ResNet, r3d_18 model, r(2+1)d_model, mc3_model, Kinetics-400, GRIT

Abstract

Sign languages, vital for communication among the deaf and hard-of-hearing (DHH) people, face a significant linguistic diversity challenge with over 200 distinct sign languages worldwide. Bridging this communication gap is a priority. Traditional tools like interpreters and costly translation devices have limitations. This project aims to use deep learning techniques to develop a model capable of recognizing sign language from short videos. Our model not only recognizes the sign from a single video clip, but is also capable of making prediction of consecutive pairs of signs. To achieve zero-short gesture sequence recognition, we propose a novel temporal dilation strategy, converting a static video classification model to accept the input of a video of gesture sequence and making a sequence of predictions. Our model achieves 99% accuracy1 on the gesture recognition dataset (GRIT), and 73.18% accuracy on gesture sequence recognition task. This advancement aims to break down barriers and enhance opportunities for DHH individuals, fostering greater inclusivity in education, employment, sports, and social activities. The source code and the pretrained models are publicly available.

Available for download on Friday, December 20, 2024

Share

COinS