Publication Date
Spring 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Sayma Akther
Second Advisor
William Andreopoulos
Third Advisor
Faranak Abri
Keywords
HealthCare Datasets, Mitigating Learning Bias, GAN
Abstract
CVDs have been a major cause of deaths worldwide with WHO reporting 17.9 million deaths annually. Although there are advancements in the treatment of these diseases, most of the fatalities are a result of untimely diagnosis. Active research is going on to collect data points and risk factors related to these diseases, which can enable early diagnosis. Of the datasets available, many researchers have employed different ML models to predict/detect the prevalence of heart diseases. Many employed Tree based, regression models [3, 6]. Few also tried ensemble approaches [1, 2, 4]. These healthcare datasets are generally found to be imbalanced. This can lead to learning bias with ML models predicting the dominant class better. Few researchers tried to tackle this problem by using common sampling techniques such as SMOTE and random oversampling [4, 5]. However, there hasn’t been an extensive study done to evaluate different techniques for this disparity in the healthcare datasets. This project aims to employ advanced techniques, e.g. Generative Adversarial Networks (GAN) based architectures and weighted Random Forest, to counter this disparity found in CVD datasets and thus, enable the ML models to learn better and predict.
Recommended Citation
Kumbhalwar, Samyak Jagdish, "Mitigating Learning Bias in Healthcare Datasets" (2024). Master's Projects. 1404.
DOI: https://doi.org/10.31979/etd.98wg-h7w3
https://scholarworks.sjsu.edu/etd_projects/1404