Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Katerina Potika

Second Advisor

William Andreopoulos

Third Advisor

Genya Ishigaki

Keywords

Natural language processing, graph classification, embedding techniques, Twitter

Abstract

Social media platforms are one of the primary resources for information as it is easily accessible, low in cost, and provides a high rate of information spread. Online social media (OSM) have become the main source of news information around the world, but because of the distributed nature of the web, it has increased the risk of fake news spread. Fake news is misleading information that is published as real news. Therefore, identifying fake news and flagging them as such, as well as detecting sources that generate them is an ongoing task for researchers and OSM companies. Bots are artificial users and are useful for various tasks. Unfortunately, bots do often disseminate inaccurate news and low-quality information [1]. Discriminating bots from real users is a difficult task to perform just based on content.

The aim of this project is to explore and expand techniques for bot detection. A novel approach is proposed by combining text features of the news and the graph structure of the diffusion of the specific news on the OSM. In our methodology, we first perform feature extraction using Natural Language Processing techniques, like the pre-trained models, BERT [2] and spaCy [3] on the news. Next, we create diffusion graphs of the news posted by bots, and news posted by real users based on how it is propagated through the OSM. We model our problem as a graph classification one and apply graph convolutional network approaches to solve it. We use the Cresci-2017 [4] Twitter dataset for our experiments, which contains real and bot users and their tweets. We expand this dataset by performing dataset augmentation using web-scrapping to fetch additional tweets and user data to create the graphs. We analyze the graphs that we constructed based on news from bots and from real users’ graphs. An experimental

comparison between the existing techniques and the proposed methodology will be performed to better understand the scope for optimization and improvement.

Recommended Citation

Kulkarni, Warada Jayant, "Twitter Bot Detection using NLP and Graph Classification" (2023). Master's Projects. 1263.
DOI: https://doi.org/10.31979/etd.ms4h-hj9x
https://scholarworks.sjsu.edu/etd_projects/1263

Download

Included in

Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.ms4h-hj9x

Master's Projects

Twitter Bot Detection using NLP and Graph Classification

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Twitter Bot Detection using NLP and Graph Classification

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links