Faculty Research, Scholarly, and Creative Activity

HealthLies: Dataset and Machine Learning Models for Detecting Fake Health News

Garima Chaphekar, San Jose State University
Jorjeta G. Jetcheva, San Jose State UniversityFollow

Publication Date

1-1-2022

Document Type

Conference Proceeding

Publication Title

Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022

DOI

10.1109/BigDataService55688.2022.00008

First Page

Last Page

Abstract

Current datasets and models focusing on health fake news identification are few and far between and primarily based on COVID-19. In this paper, we introduce a new health news-specific dataset called HealthLies, which includes 11,001 facts and myths about diseases such as COVID-19, Cancer, Polio, Zika, HIV/AIDS, SARS, and Ebola collected from a wide range of sources. We train several machine learning models, including KNN, SVM, Logistic Regression, Naive Bayes, an MLP Classifier, and a deep learning model based on the state-of-the-art Natural Language Processing (NLP) BERT model, which we name BERT-HealthLies. We find that BERT-HealthLies typically achieves the highest accuracy across models, though other models may be preferable in some real-time applications due to their orders of magnitude faster prediction and training times. In addition, ensembling BERT-HealthLies with the other models performs up to 12% better than BERT-HealthLies alone when identifying fake news related to a new disease for which we do not yet have training data.

Keywords

Fake Health News, Fake News, HealthLies, Machine Learning, NLP

Department

Computer Engineering

Recommended Citation

Garima Chaphekar and Jorjeta G. Jetcheva. "HealthLies: Dataset and Machine Learning Models for Detecting Fake Health News" Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022 (2022): 1-8. https://doi.org/10.1109/BigDataService55688.2022.00008

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

HealthLies: Dataset and Machine Learning Models for Detecting Fake Health News

Publication Date

Document Type

Publication Title

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

HealthLies: Dataset and Machine Learning Models for Detecting Fake Health News

Authors

Publication Date

Document Type

Publication Title

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links