Publication Date

Fall 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Faranak Abri

Second Advisor

William Andreopoulos

Third Advisor

Nithish Kumar Reddy Rajapuram

Keywords

Deception Detection, Multimodal Learning, Speech and Text Analysis, Deep Learning

Abstract

Detecting deception remains a critical challenge across multiple domains, from security screening and forensic investigations to hiring processes and fraud prevention. Traditional methods like polygraph testing suffer from invasiveness, subjectivity, and limited accuracy. This research explores how modern deep learning can address these limitations by analyzing both acoustic and linguistic cues in human speech. We developed a multimodal system that combines audio and text analysis to detect deception in the DOLOS dataset, which contains 1,675 video clips from real high-stakes scenarios. Our approach processes audio through a systematic pipeline that isolates voice, removes silence, and reduces noise before extracting features using Wav2Vec2, a state-of-the-art speech model. This careful preprocessing alone improved detection accuracy by 2.01 F1 points, reaching 75.5% F1 for audio-only classification. For text analysis, we used OpenAI’s Whisper to automatically transcribe speech, then applied BERT combined with bidirectional LSTM networks to detect linguistic deception patterns. Despite transcription errors (12.8% word error rate), the text classifier achieved 65.0% F1, demonstrating that key linguistic markers of deception—such as pronoun usage, hedging, and negation patterns—survive the transcription process. When we combined both modalities through feature-level fusion, the system achieved 74.2% F1 and 64.7% AUC-ROC, outperforming the previous best published result by 3.01 F1 points. Importantly, our evaluation used strong baseline models rather than artificially weakened comparisons, providing an honest assessment of multimodal fusion benefits. The results show that while audio dominates for short utterances, text provides complementary information that meaningfully improvesdetection when both are available. This work demonstrates that practical deception detection systems can work with automatically transcribed speech rather than requiring manual transcriptions, opening pathways for real-world deployment in security and investigative contexts. Keywords: Deception detection, multimodal fusion, automatic speech recognition, Wav2Vec2, BERT, audio preprocessing, deep learning

Available for download on Saturday, December 19, 2026

Share

COinS