Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)


Computer Science

First Advisor

Sayma Akther

Second Advisor

William Andreopoulos

Third Advisor

Faranak Abri


HealthCare Datasets, Mitigating Learning Bias, GAN


CVDs have been a major cause of deaths worldwide with WHO reporting 17.9 million deaths annually. Although there are advancements in the treatment of these diseases, most of the fatalities are a result of untimely diagnosis. Active research is going on to collect data points and risk factors related to these diseases, which can enable early diagnosis. Of the datasets available, many researchers have employed different ML models to predict/detect the prevalence of heart diseases. Many employed Tree based, regression models [3, 6]. Few also tried ensemble approaches [1, 2, 4]. These healthcare datasets are generally found to be imbalanced. This can lead to learning bias with ML models predicting the dominant class better. Few researchers tried to tackle this problem by using common sampling techniques such as SMOTE and random oversampling [4, 5]. However, there hasn’t been an extensive study done to evaluate different techniques for this disparity in the healthcare datasets. This project aims to employ advanced techniques, e.g. Generative Adversarial Networks (GAN) based architectures and weighted Random Forest, to counter this disparity found in CVD datasets and thus, enable the ML models to learn better and predict.

Available for download on Sunday, May 25, 2025