GenEthos: A Synthetic Data Generation System with Bias Detection and Mitigation

Publication Date


Document Type

Conference Proceeding

Publication Title

Proceedings of International Conference on Computing, Communication, Security and Intelligent Systems, IC3SIS 2022




Data-driven models function admirably in solving real-world problems. However, obtaining relevant data is difficult. Also, sometimes more diverse data is needed to identify the limitations of trained Machine Learning models. Creating such data samples based on earlier known metadata is a common practice. However, this process can induce bias in the dataset unknowingly. Generative Adversarial Networks (GAN) based data generation models generate more data based on initial data distribution. Thus, data generation models may reflect bias in the generated synthetic data. In this study, the authors have proposed an interactive synthetic data generation Graphical User Interface (GUI) tool. The tool is equipped with Bias detection and mitigation algorithms which will notify users about the pre-existing bias and provide methods to mitigate it. Similarly, this tool can be used to evaluate synthetic data generated using GAN-based models against fairness metrics. The authors have found that Learning Fair Representation (LFR) bias mitigation method has performed 62% 17.5% better than Prejudice remover and Disparate impact remover for German Credit Adult original datasets. These results were concluded based on bias detection metrics such as Statistical Parity Difference (SPD) and Disparate Impact (DI). The proposed data generation tool used with LFR method can reduced SPD metric by 93% on original German Credit data. The authors conclude that both original and synthetic datasets had a bias. Therefore, the fairness level of any dataset should be checked vigilantly.


Ethical Bias Mitigation, Ethical Data Generation, Ethical Fairness Detection, Generative Adversarial Network, Synthetic Data Generation


Computer Engineering