Master of Science in Computer Science (MSCS)
Christopher J Pollett
Deep Learning, Depth-wise convolutional model, Mel Frequency Cepstral Coefficient, Sequential Model
Emotion recognition has been an integral part of many applications like video games, cognitive computing, and human computer interaction. Emotion can be recognized by many sources including speech, facial expressions, hand gestures and textual attributes. We have developed a prototype emotion recognition system using computer vision and natural language processing techniques. Our goal hybrid system uses mobile camera frames and features abstracted from speech named Mel Frequency Cepstral Coefficient (MFCC) to recognize the emotion of a person. To acknowledge the emotions based on facial expressions, we have developed a Convolutional Neural Network (CNN) model, which has an accuracy of 68%. To recognize emotions based on Speech MFCCs, we have developed a sequential model with an accuracy of 69%. Out Android application can access the front and back camera simultaneously. This allows our application to predict the emotion of the overall conversation happening between the people facing both cameras. The application is also able to record the audio conversation between those people. The two emotions predicted (Face and Speech) are merged into one single emotion using the Fusion Algorithm. Our models are converted to TensorFlow-lite models to reduce the model size and support the limited processing power of mobile. Our system classifies emotions into seven classes: neutral, surprise, happy, fear, sad, disgust, and angry
Kajale, Akshay, "Visual and Lingual Emotion Recognition using Deep Learning Techniques" (2021). Master's Projects. 988.