Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Robert Chun

Second Advisor

William Andreopoulos

Third Advisor

Manika Makam


Youtube comment, spam classification, logistic regression, SVM, MLP, BERT


This paper suggests an innovative way for finding spam or ham comments on the video- sharing website YouTube. Comments that are contextually irrelevant for a particular video or have a commercial motive constitute as spam. In the past few years, with the advent of advertisements spreading to new arenas such as the social media has created a lucrative platform for many. Today, it is being widely used by everyone. But this innovation comes with its own impediments. We can see how malicious users have taken over these platforms with the aid of automated bots that can deploy a well-coordinated spam across multiple streams in a matter of seconds. This can cause a major disruption to one’s social media experience and greatly tarnish a channel’s reputation.

Presently, the only approach YouTube has applied to tackle these is by blocking comments that have links. These methods are often futile as spammers are known to quickly circumvent such obstacles. Standard machine learning algorithms might prove to be helpful to a certain extent but the only way this issue can be properly checked is with an approach that built around better accuracies. It is our aim in this paper to propose a method for detecting these comments through the development of an innovative method using machine learning algorithms like Logistic Regression, Multilayer Perceptron, Random Forest, Support Vector Machine, Ensemble model and BERT that have been shown to detect and limit spam effectively on these platforms.

Available for download on Sunday, May 26, 2024