Publication Date

Fall 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Amith Kamath Belman

Second Advisor

Fabio Di Troia

Third Advisor

Wilson Tang

Keywords

Voice Authentication, Data Poisoning, Biometrics, SVM, Machine Learning, Automated Speaker Verification, HiFi GAN

Abstract

Voice Authentication (VA), also known as Automatic Speaker Verification (ASV), is a widely adopted authentication method, particularly in automated systems like banking services, where it serves as a secondary layer of user authentication. Despite its popularity, VA systems are vulnerable to various attacks, including replay, impersonation, and the emerging threat of deepfake audio that mimics the voice of legitimate users. To mitigate these risks, several defense mechanisms have been proposed. One such solution, ‘‘Voice Pops", aims to distinguish an individual’s unique phoneme pronunciations during the enrollment process. While promising, the effectiveness of VA+VoicePop against a broader range of attacks, particularly logical or adversarial attacks, remains insufficiently explored. We propose a novel attack method, which we refer to as SyntheticPop, designed to target the phoneme recognition capabilities of the VA+VoicePop system. The first iteration of this attack exploits the feature extraction process of VA+VoicePops by poisoning 20% of training samples labeled "spoof" through embedding a 90Hz wave throughout the audio, in order to confuse the model’s ideas of what is considered legitimate audio. This attack proved successful, as system accuracy dropped 55%, representing the needs for more robust training algorithms. However, while successful, this method presents two main obstacles: the human test and detection due to the attack being very broad and easily detectable due to many unnatural peaks in the audio. This led to the development of SyntheticPop+FS2, which leverages the temporal stability of FS2 and introduces SyntheticPops at phoneme locations. Through our testing, it is revealed that while not leading to significant model collapse like SyntheticPop, SyntheticPop+FS2 still leads to a respectable 11.86% decrease in accuracy under the same total poison (20%). In addition, the strategic placement of SyntheticPops leads to a less pattern recognizable audio signal and the addition of a small amount of Gaussian noise helps mask energy signatures that could indicate the use of FS2.

Recommended Citation

Jamdar, Eshaq, "SyntheticPop: An Investigation Into Poisoning Automated Speaker Verification Systems" (2025). Master's Projects. 1618.
DOI: https://doi.org/10.31979/etd.zrgc-3rzy
https://scholarworks.sjsu.edu/etd_projects/1618

Download

Available for download on Saturday, December 19, 2026

Included in

Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.zrgc-3rzy

Master's Projects

SyntheticPop: An Investigation Into Poisoning Automated Speaker Verification Systems

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

SyntheticPop: An Investigation Into Poisoning Automated Speaker Verification Systems

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links