Publication Date
Spring 2023
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Fabio Di Troia
Second Advisor
William Andreopoulos
Third Advisor
Robert Chun
Keywords
Indian Hate Speech, fastText, GloVe, distilBERT, MuRIL
Abstract
Social media is a great place to share one’s thoughts and to express oneself. Very often the same social media platforms become a means for spewing hatred.The large amount of data being shared on these platforms make it difficult to moderate the content shared by users. In a diverse country like India hate is present on social media in all regional languages, making it even more difficult to detect hate because of a lack of enough data to train deep/ machine learning models to make them understand regional languages.This work is our attempt at tackling hate speech in Hindi. We experiment with embeddings like fastText and GloVe combined with machine learning classifiers like logistic regression and decision tree classifier. We also experiment with transformer based embeddings like distilBERT and MuRIL.The transformer based models perform better in our task and we achieve an F1 score of 0.73 with the help of MuRIL embeddings.
Recommended Citation
Bansod, Pranjali Prakash, "Hate Speech Detection in Hindi" (2023). Master's Projects. 1265.
DOI: https://doi.org/10.31979/etd.yc74-7qas
https://scholarworks.sjsu.edu/etd_projects/1265