Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)


Computer Science

First Advisor

Sayma Akther

Second Advisor

William Andreopoulos

Third Advisor

Nada Attar


CNN, Gesture Recognition, LSTM, Mediapipe, Pose Extraction


This paper, Gesture Recognition Dynamics: Revealing Video Patterns with Deep Learning, explores the combination of Long Short-Term Memory(LSTM) with Convolutional Neural Network(CNN) in the identification of convoluted human activities. The study assesses LSTM’s capability to capture temporal dependencies and CNN’s potential to apprehend and extract spatial characteristics to detect the gestures from UCF50. It further evaluates the architecture linkage of LSTM and CNN, which will improve the analytical capacity to interpret and validate dynamic gesture trends. The paper utilizes Mediapipe, an open-source framework created by Google specifically designed for extracting poses. The Mediapipe tool is well-designed to track important body parts that are necessary for identifying different activities. Moreover, this paper shows that the flexibility of Mediapipe’s modular structure is a strong framework for study. The modular structure enables researchers to iteratively increase their expertise in technology and create highly functioning models for recognizing human activities. This paper also demonstrates the performance of ConvLSTM2D and Conv2D neural network models with Mediapipe pose estimation.

Available for download on Sunday, May 25, 2025