Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Katerina Potika

Second Advisor

William Andreopoulos

Third Advisor

Genya Ishigaki


Natural language processing, graph classification, embedding techniques, Twitter


Social media platforms are one of the primary resources for information as it is easily accessible, low in cost, and provides a high rate of information spread. Online social media (OSM) have become the main source of news information around the world, but because of the distributed nature of the web, it has increased the risk of fake news spread. Fake news is misleading information that is published as real news. Therefore, identifying fake news and flagging them as such, as well as detecting sources that generate them is an ongoing task for researchers and OSM companies. Bots are artificial users and are useful for various tasks. Unfortunately, bots do often disseminate inaccurate news and low-quality information [1]. Discriminating bots from real users is a difficult task to perform just based on content.

The aim of this project is to explore and expand techniques for bot detection. A novel approach is proposed by combining text features of the news and the graph structure of the diffusion of the specific news on the OSM. In our methodology, we first perform feature extraction using Natural Language Processing techniques, like the pre-trained models, BERT [2] and spaCy [3] on the news. Next, we create diffusion graphs of the news posted by bots, and news posted by real users based on how it is propagated through the OSM. We model our problem as a graph classification one and apply graph convolutional network approaches to solve it. We use the Cresci-2017 [4] Twitter dataset for our experiments, which contains real and bot users and their tweets. We expand this dataset by performing dataset augmentation using web-scrapping to fetch additional tweets and user data to create the graphs. We analyze the graphs that we constructed based on news from bots and from real users’ graphs. An experimental

comparison between the existing techniques and the proposed methodology will be performed to better understand the scope for optimization and improvement.