Author

Ajeet Kotturu

Publication Date

Fall 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Teng Moh

Second Advisor

Melody Moh

Third Advisor

Amith Kamath Belman

Keywords

Data Poisoning, Mislabeling, Injection, Recurrent Neural Network, Support Vector Machine, Resource Scheme Multilinear Regression

Abstract

Data poisoning occurs in various datasets; however, it is more challenging to detect poisoning in textual datasets compared to image datasets. The focus of this paper is to determine how to detect poisoning in textual datasets. We focused on four poisoning attacks, mislabeling, injection, targeted, and non targeted attacks. Recurrent Neural Networks (RNN), Support Vector Machine (SVM), and Resource Scheme Multilinear Regression (RSMLR) are used for detecting poisoning. A custom RNN class containing an encoder and decoder was created for the RNN. 10% of the data set was used for the SVM to determine whether the rest of the dataset was poisoned. The target column was processed separately from the entire dataset for the RSMLR. To improve each model, a threshold equation was used to determine the poisoned that needed to be flagged. Using the best parameter values, the models are used for a Federated Learning (FL) for multiple passes and shuffling. Based on the experimental results, the use of the RNN and SVM together in shuffling yields the best results for poisoning attacks. The RSMLR had the poorest performance but performed well when detecting poisoning in shuffled datasets. Based on the model shuffling experiment, the models yield average accuracies of 41% for mislabeling datasets, 92% for injection datasets and 68% for targeted datasets. For the Non Targeted attacks, both RNN and SVM yield accuracies of 100%.

Available for download on Friday, December 18, 2026

Share

COinS