Master of Science in Computer Science (MSCS)
Sign Language Translation, Video ResNet, r3d_18 model, r(2+1)d_model, mc3_model, Kinetics-400, GRIT
Sign languages, vital for communication among the deaf and hard-of-hearing (DHH) people, face a significant linguistic diversity challenge with over 200 distinct sign languages worldwide. Bridging this communication gap is a priority. Traditional tools like interpreters and costly translation devices have limitations. This project aims to use deep learning techniques to develop a model capable of recognizing sign language from short videos. Our model not only recognizes the sign from a single video clip, but is also capable of making prediction of consecutive pairs of signs. To achieve zero-short gesture sequence recognition, we propose a novel temporal dilation strategy, converting a static video classification model to accept the input of a video of gesture sequence and making a sequence of predictions. Our model achieves 99% accuracy1 on the gesture recognition dataset (GRIT), and 73.18% accuracy on gesture sequence recognition task. This advancement aims to break down barriers and enhance opportunities for DHH individuals, fostering greater inclusivity in education, employment, sports, and social activities. The source code and the pretrained models are publicly available.
Yang, Xiaoqian, "Temporal Dilation in Video ResNet for Sign Language Translation" (2023). Master's Projects. 1326.
Available for download on Friday, December 20, 2024