Publication Date

Fall 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Ching-seh Wu

Second Advisor

William Andreopolous

Third Advisor

Robert Chun

Keywords

Machine learning, classification, natural language processing, cyberbullying detection, neural networks

Abstract

The issue of cyberbullying is growing due to the online anonymity and due to online platforms having less repercussions. This research proposes for proactive measures to detect and prevent such behavior before it reaches the victim. By using data from various social media platforms and employing machine learning techniques, this research proposes an innovative system aimed at identifying and thwarting cyberbullying incidents preemptively. While existing methods have primarily focused on prediction and detection of cyberbullying incidents, there remains a significant gap in research regarding prevention strategies. This project aims to address this gap by leveraging machine learning, natural language processing (NLP), and software development techniques to proactively prevent cyberbullying. This project uses an approach that involves the implementation of blocking and warning mechanisms to intervene before harmful content reaches the intended victim, fostering a safer online environment. In our research, we have also conducted an extensive comparison of five different feature engineering methods, along with nine machine learning algorithms. These algorithms encompass three ensemble methods, four statistical methods, and two deep learning algorithms, each with two variations. Additionally, we integrate data from multiple online platforms such as Twitter, Wikipedia comments, Kaggle and YouTube, to capture varying user behaviors effectively. Recognizing that behaviors may differ across platforms, our research employs a comprehensive approach to gather insights from diverse sources. Throughout this process, the achieved accuracy across the different algorithms ranges from 87.2% to 95.5%. In this report, we will also discuss other metrics that are relevant to text classification, apart from accuracy.

Available for download on Saturday, December 13, 2025

Share

COinS