Publication Date

Spring 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Melody Moh

Second Advisor

Robert Chun

Third Advisor

Teng Moh

Keywords

hate speech detection, social networks

Abstract

Although current state-of-the-art hate speech detection models achieve praiseworthy results, these models have shown themselves to be vulnerable to attack. Easy to execute lexical manipulations such as the removal of whitespace from a given text create significant issues for word-based hate speech detection models. In this paper, we reproduce the results of five cutting edge models as well as four significant evasion schemes from prior work. Only a limited amount of evasion schemes that also maintain readability exists, and this works to our advantage in the recreation of the original data. Furthermore, we demonstrate that each lexical attack or evasion scheme can be overcome with our new defense mechanisms with some reducing the effectiveness of the scheme to 1%. We also propose a new evasion scheme that outperforms the those in previous work along with a corresponding defense. Using our results as a foundation, we contend that hate speech detection models can be defended against lexically attacked data without the need for significant retraining.

Recommended Citation

Khieu, Brian Tuan, "TSAR : A System for Defending Hate Speech Detection Models Against Adversaries" (2019). Master's Projects. 740.
DOI: https://doi.org/10.31979/etd.6tsk-redu
https://scholarworks.sjsu.edu/etd_projects/740

Download

Included in

Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.6tsk-redu

Master's Projects

TSAR : A System for Defending Hate Speech Detection Models Against Adversaries

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

TSAR : A System for Defending Hate Speech Detection Models Against Adversaries

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links