Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.

Publication Date

Spring 2023

Degree Type

Thesis - Campus Access Only

Degree Name

Master of Science (MS)


Applied Data Science


Seungjoon Lee

Subject Areas

Artificial intelligence


Facial expression Recognition (FER) has growing significance in diverse fields such as psychology, medicine, sports, and entertainment. FER is used in the medical field to recognize signs of depression, anxiety, and autism. FER also finds its niche in self-driving cars to observe signs of fatigue and distress in a driver and provide timely intervention to enhance transport safety. Facial expressions combined with other modalities offer great insight into the emotional state and its triggers. Computer vision, machine learning, and deep learning methods have recently gained widespread attention in detecting and classifying spontaneous facial expressions. Static images and video sequences in 2D have extensively been used for FER and emotion recognition. However, only a few algorithms combine 2D video sequences and multimodal data to detect and classify emotions. To this end, this research aims to develop a deep-learning model for classifying emotions using Karolinska Directed Emotional Face (KDEF) and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). A CNN-RNN model is built to classify facial expressions for synthetic video sequence data generated using the KDEF dataset. This model is extended to features extracted from the RAVDESS video dataset. Furthermore, A Transformer model with a dual-head self-attention layer is created to identify the frames with the most useful information for classification. Finally, a late fusion architecture is used to merge the posteriors of the static audio, static video, and Transformer models to create a multimodal classification model.