Publication Date
Fall 2023
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Chris Tseng
Second Advisor
Faranak Abri
Third Advisor
Hang Zhang
Keywords
Sign Language Translation, Video ResNet, r3d_18 model, r(2+1)d_model, mc3_model, Kinetics-400, GRIT
Abstract
Sign languages, vital for communication among the deaf and hard-of-hearing (DHH) people, face a significant linguistic diversity challenge with over 200 distinct sign languages worldwide. Bridging this communication gap is a priority. Traditional tools like interpreters and costly translation devices have limitations. This project aims to use deep learning techniques to develop a model capable of recognizing sign language from short videos. Our model not only recognizes the sign from a single video clip, but is also capable of making prediction of consecutive pairs of signs. To achieve zero-short gesture sequence recognition, we propose a novel temporal dilation strategy, converting a static video classification model to accept the input of a video of gesture sequence and making a sequence of predictions. Our model achieves 99% accuracy1 on the gesture recognition dataset (GRIT), and 73.18% accuracy on gesture sequence recognition task. This advancement aims to break down barriers and enhance opportunities for DHH individuals, fostering greater inclusivity in education, employment, sports, and social activities. The source code and the pretrained models are publicly available.
Recommended Citation
Yang, Xiaoqian, "Temporal Dilation in Video ResNet for Sign Language Translation" (2023). Master's Projects. 1326.
DOI: https://doi.org/10.31979/etd.9nk9-qh4u
https://scholarworks.sjsu.edu/etd_projects/1326