Impact of Data Heterogeneity on AI/ML Model Accuracy in Assisting Pneumonia Type Prediction

Publication Date

1-1-2024

Document Type

Conference Proceeding

Publication Title

Proceedings of the 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2024

DOI

10.1109/IAICT62357.2024.10617531

First Page

253

Last Page

258

Abstract

Pneumonia is the fourth most common cause of mortality, resulting in more than 50,000 deaths in the U.S. alone every year. Cases of this respiratory infection have only been exacerbated by the COVID-19 pandemic as the virus tends to attack airways and gas exchange regions. The diagnosis of COVID-19 pneumonia depends on various factors, including the severity as well as the type of the disease, which physicians attempt to determine preliminarily by analyzing chest X-ray scans. With the enormous amounts of X-ray data, one can utilize an automated procedure to identify the defects in scanned images that in conjunction with other clinical diagnostics can lead to the verification of disease presence. Machine learning has emerged as a powerful tool to enable high-accuracy medical diagnostics. In the current work, various neural network algorithms, including the convolutional neural network (CNN), CNN+DenseNet121, CNN+EfficientNetB7, and CNN+ResNet50 were employed to classify chest X-ray images as one of the following diagnoses: Negative for COVID-19 pneumonia, Mild Atypical COVID-19 pneumonia, Moderate Atypical COVID-19 pneumonia, Severe Atypical COVID-19 pneumonia, Mild Indeterminate COVID-19 pneumonia, Moderate Indeterminate COVID-19 pneumonia, Severe Indeterminate COVID-19 pneumonia, Mild Typical COVID-19 pneumonia, Moderate Typical COVID-19 pneumonia, and Severe Typical COVID-19 pneumonia. The CNN, CNN+DenseNet121, CNN+EfficientNetB7, and CNN+ResNet50 models achieved training accuracies of 47.62%, 84.08%, 64.08%, and 74.30% and validation accuracies of 42.29%, 50.25%, 53.98%, and 43.28% respectively. Moderate classification performance across all four of the models suggests that data heterogeneity, particularly the presence of ten similar diagnostic scenarios, greatly limits the potential of machine learning in medical diagnostics. Nevertheless, data manipulations and advanced modeling is being studied further to overcome this barrier.

Keywords

Chest X-ray Images, COVID-19, Machine Learning, Neural Network, Pneumonia

Department

Mechanical Engineering

Share

COinS