Master of Science (MS)
classification task, data augmentation, deep learning models, ensemble techniques, sign language recognition, skeletal keypoints, pose extraction
Sign language recognition (SLR) has long been a studied subject and research field within the Computer Vision domain. Appearance-based and pose-based approaches are two ways to tackle SLR tasks. Various models from traditional to current state-of-the-art including HOG-based features, Convolutional Neural Network, Recurrent Neural Network, Transformer, and Graph Convolutional Network have been utilized to tackle the area of SLR. While classifying alphabet letters in sign language has shown high accuracy rates, recognizing words presents its set of difficulties including the large vocabulary size, the subtleties in body motions and hand orientations, and regional dialects and variations. The emergence of deep learning has created opportunities for improved word-level sign recognition, but challenges such as overfitting and limited training data remain. Techniques such as data augmentation, feature engineering, hyperparameter tuning, optimization, and ensemble methods have been used to overcome these challenges and improve the accuracy and generalization ability of ASL classification models. We explore various methods to improve the accuracy and performance in this project. From the approach, we were able to first reproduce a baseline accuracy of 43.02% on the WLASL dataset and further achieve an improvement in accuracy at 55.96%. We also extended the work to a different dataset to gain a comprehensive understanding of our work.
Luong, Shayla, "Video Sign Language Recognition using Pose Extraction and Deep Learning Models" (2023). Master's Projects. 1251.