Publication Date

Fall 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Teng Moh

Second Advisor

Melody Moh

Third Advisor

Amith Kamath Belman

Keywords

Data Poisoning, Mislabeling, Injection, Recurrent Neural Network, Support Vector Machine, Resource Scheme Multilinear Regression

Abstract

Data poisoning occurs in various datasets; however, it is more challenging to detect poisoning in textual datasets compared to image datasets. The focus of this paper is to determine how to detect poisoning in textual datasets. We focused on four poisoning attacks, mislabeling, injection, targeted, and non targeted attacks. Recurrent Neural Networks (RNN), Support Vector Machine (SVM), and Resource Scheme Multilinear Regression (RSMLR) are used for detecting poisoning. A custom RNN class containing an encoder and decoder was created for the RNN. 10% of the data set was used for the SVM to determine whether the rest of the dataset was poisoned. The target column was processed separately from the entire dataset for the RSMLR. To improve each model, a threshold equation was used to determine the poisoned that needed to be flagged. Using the best parameter values, the models are used for a Federated Learning (FL) for multiple passes and shuffling. Based on the experimental results, the use of the RNN and SVM together in shuffling yields the best results for poisoning attacks. The RSMLR had the poorest performance but performed well when detecting poisoning in shuffled datasets. Based on the model shuffling experiment, the models yield average accuracies of 41% for mislabeling datasets, 92% for injection datasets and 68% for targeted datasets. For the Non Targeted attacks, both RNN and SVM yield accuracies of 100%.

Recommended Citation

Kotturu, Ajeet, "Detection and Mitigation for Poisoned Textual Datasets" (2025). Master's Projects. 1607.
DOI: https://doi.org/10.31979/etd.uara-bjzm
https://scholarworks.sjsu.edu/etd_projects/1607

Download

Available for download on Friday, December 18, 2026

Included in

Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.uara-bjzm

Master's Projects

Detection and Mitigation for Poisoned Textual Datasets

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Detection and Mitigation for Poisoned Textual Datasets

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links