Publication Date
Spring 2023
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Katerina Potika
Second Advisor
William Andreopoulos
Third Advisor
Genya Ishigaki
Keywords
Natural language processing, graph classification, embedding techniques, Twitter
Abstract
Social media platforms are one of the primary resources for information as it is easily accessible, low in cost, and provides a high rate of information spread. Online social media (OSM) have become the main source of news information around the world, but because of the distributed nature of the web, it has increased the risk of fake news spread. Fake news is misleading information that is published as real news. Therefore, identifying fake news and flagging them as such, as well as detecting sources that generate them is an ongoing task for researchers and OSM companies. Bots are artificial users and are useful for various tasks. Unfortunately, bots do often disseminate inaccurate news and low-quality information [1]. Discriminating bots from real users is a difficult task to perform just based on content.
The aim of this project is to explore and expand techniques for bot detection. A novel approach is proposed by combining text features of the news and the graph structure of the diffusion of the specific news on the OSM. In our methodology, we first perform feature extraction using Natural Language Processing techniques, like the pre-trained models, BERT [2] and spaCy [3] on the news. Next, we create diffusion graphs of the news posted by bots, and news posted by real users based on how it is propagated through the OSM. We model our problem as a graph classification one and apply graph convolutional network approaches to solve it. We use the Cresci-2017 [4] Twitter dataset for our experiments, which contains real and bot users and their tweets. We expand this dataset by performing dataset augmentation using web-scrapping to fetch additional tweets and user data to create the graphs. We analyze the graphs that we constructed based on news from bots and from real users’ graphs. An experimental
comparison between the existing techniques and the proposed methodology will be performed to better understand the scope for optimization and improvement.
Recommended Citation
Kulkarni, Warada Jayant, "Twitter Bot Detection using NLP and Graph Classification" (2023). Master's Projects. 1263.
DOI: https://doi.org/10.31979/etd.ms4h-hj9x
https://scholarworks.sjsu.edu/etd_projects/1263