Faculty Research, Scholarly, and Creative Activity

No "Love" Lost: Defending Hate Speech Detection Models Against Adversaries

Melody Moh, San Jose State UniversityFollow
Teng Sheng Moh, San Jose State UniversityFollow
Brian Khieu, San Jose State University

Publication Date

1-1-2020

Document Type

Conference Proceeding

Publication Title

2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM)

DOI

10.1109/IMCOM48794.2020.9001767

Abstract

Although current state-of-the-art hate speech detection models achieve praiseworthy results, these models have shown themselves to be vulnerable to attacks. Easy-to-execute lexical evasion schemes such as removal of whitespace from a given text creates significant issues for word-based hate speech detection models. In this paper, we reproduce the results of five cutting-edge models as well as four significant evasion schemes from prior work. These schemes are required to maintain readability which enables us to recreate the original data. We present several new defenses that leverage this need for maintained meaning and readability, and these schemes perform on par with or exceed the results of adversarial retraining. Furthermore, we demonstrate that each lexical attack or evasion scheme can be overcome with our new defense mechanisms with some reducing the effectiveness of the scheme to a mere.1 to.01 drop in F-1 score. We also propose a new evasion scheme that outperforms those in previous work along with a corresponding defense. Using our results as a foundation, we contend that hate speech detection models can be defended against lexically morphed data without the need for significant retraining. Our work suggests that by utilizing the requirement for preserved meaning, one can create a suitable defense against evasion schemes with a high reversal rate.

Funding Number

2018-2023

Keywords

adversarial attacks, deep learning, lexical attacks, machine learning, social media

Department

Computer Science

Recommended Citation

Melody Moh, Teng Sheng Moh, and Brian Khieu. "No "Love" Lost: Defending Hate Speech Detection Models Against Adversaries" 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM) (2020). https://doi.org/10.1109/IMCOM48794.2020.9001767

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

No "Love" Lost: Defending Hate Speech Detection Models Against Adversaries

Publication Date

Document Type

Publication Title

DOI

Abstract

Funding Number

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

No "Love" Lost: Defending Hate Speech Detection Models Against Adversaries

Authors

Publication Date

Document Type

Publication Title

DOI

Abstract

Funding Number

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links