Publication Date
Spring 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Sayma Akther
Second Advisor
William Andreopoulos
Third Advisor
Nada Attar
Keywords
CNN, Gesture Recognition, LSTM, Mediapipe, Pose Extraction
Abstract
This paper, Gesture Recognition Dynamics: Revealing Video Patterns with Deep Learning, explores the combination of Long Short-Term Memory(LSTM) with Convolutional Neural Network(CNN) in the identification of convoluted human activities. The study assesses LSTM’s capability to capture temporal dependencies and CNN’s potential to apprehend and extract spatial characteristics to detect the gestures from UCF50. It further evaluates the architecture linkage of LSTM and CNN, which will improve the analytical capacity to interpret and validate dynamic gesture trends. The paper utilizes Mediapipe, an open-source framework created by Google specifically designed for extracting poses. The Mediapipe tool is well-designed to track important body parts that are necessary for identifying different activities. Moreover, this paper shows that the flexibility of Mediapipe’s modular structure is a strong framework for study. The modular structure enables researchers to iteratively increase their expertise in technology and create highly functioning models for recognizing human activities. This paper also demonstrates the performance of ConvLSTM2D and Conv2D neural network models with Mediapipe pose estimation.
Recommended Citation
Agumamidi, Nithish Reddy, "Gesture Recognition Dynamics: Unveiling Video Patterns with Deep Learning" (2024). Master's Projects. 1405.
DOI: https://doi.org/10.31979/etd.qsqg-agg2
https://scholarworks.sjsu.edu/etd_projects/1405