Publication Date

Fall 2025

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Kaikai Liu; Bernardo Flores; Mahima Agumbe Suresh

Abstract

The widespread use of AI-based audio deepfakes threatens severely to undermine media integrity and public trust. Speech synthesis techniques have improved dramatically in voice conversion (VC) and text-to-speech (TTS) in recent years, making forgeries sound highly realistic, and concerns are raised about possible malevolent uses. Existing state-of-the-art techniques for identifying fake speech have proven to be effective in some cases but are still limited in application and robustness when faced with novel attacking strategies, different acoustic conditions, or alternative linguistic domains. To address some of these limitations, the current research presents a novel deepfake audio detection system based on personalized retrieval-augmented generation (RAG). Firstly, the method involves comparing audio input with past samples for identifying fake content. By using RAG, the model increases generalizability and learns unique vocal characteristics. Secondly, the research methodology involves testing and improvement upon previous deepfake sound recognition models in different databases. Additionally, this research work proposes a lightweight solution for immediate identification of deceptive communication, allowing simplicity and convenience in identifying and managing possible threats. This research work provides a more generalized and personalized framework for identifying deepfake audio, thus making identification in real cases more robust and truthful. Keywords: deepfake audio detection, retrieval-augmented generation, speaker verification, synthesis artifact detection, self-supervised learning, audio forensics

Available for download on Saturday, August 15, 2026

Share

COinS