Analyzing and Addressing Data-driven Fairness Issues in Machine Learning Models used for Societal Problems

Vishnu S. Pendyala, San Jose State University
Hyungkyun Kim, San Jose State University


This work aims to systematically analyze and address fairness issues arising in machine learning models because of class imbalances present in data, specifically used for addressing societal problems and providing unique insights. Using a specific data set, spectral analysis is first performed to present evidence and characterize the fairness issues. Subsequently, a series of class imbalance correction techniques are applied before the data is used to generate various machine learning models. The models so generated are then evaluated using multiple metrics. The results are then analyzed to compare the various approaches to determine the relative merits of each. As the experiments described in this paper confirm, not all oversampling techniques help in correcting data-induced model biases. Based on the Kappa statistic, F-1 score, and accuracy measured by the area under the Receiver Operating Characteristic curve, among the approaches evaluated, the Majority Weighted Minority Oversampling Technique, MWMOTE oversampling technique addresses the fairness issues the best and also improves the performance of the models at least for the dataset in consideration. The experiments also demonstrate that some of the oversampling techniques can degrade the models both in terms of performance and fairness. The results are interpreted using the evaluation metrics.