Publication Date

Spring 2022

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Jorjeta Jetcheva

Subject Areas

Computer engineering

Abstract

Fake news has always been a critical and challenging problem in the informationenvironment. The propagation of false news is a serious concern, especially in medical information, which can have dangerous and potentially deadly consequences. With the tsunami of online misinformation, it is crucial to fight fake medical news. In this study, we use machine learning techniques to help detect fake news related to diseases, including COVID-19, Ebola, Zika, SARS, Cancer, and Polio. To facilitate research in this space, we create a new medical dataset named MedHub. MedHub has records from two publicly available datasets on COVID and manually curated facts and myths about the other diseases. In addition, we build several different machine learning models trained on MedHub, including KNN, Na¨ıve Bayes, SVM, Logistic regression, and MLP classifier, and present a proof-of-concept web application that uses these models to detect fake medical news. Our best-performing model, which we call Disease Myth Buster, is based on BERT and achieves an accuracy of 99%. In addition, we perform experiments to demonstrate that 1) our models perform well at identifying misinformation related to any disease even if it is not represented in the dataset, and 2) they are well optimized to identify COVID-19 specific misinformation, and 3) Disease Myth Buster can be extended for general fake news classification using Transfer learning. We create two new manually curated test datasets for the first two experiments. The first test dataset has 164 records related to Diabetes and the second test dataset has 13459 records of COVID-19 myths. We open-source all our datasets and models for future research.

Share

COinS