Faculty Research, Scholarly, and Creative Activity

GenEthos: A Synthetic Data Generation System with Bias Detection and Mitigation

Shubham Gujar, Vishwakarma Institute of Information Technology
Tanishka Shah, Vishwakarma Institute of Information Technology
Dewen Honawale, Vishwakarma Institute of Information Technology
Vedant Bhosale, Vishwakarma Institute of Information Technology
Faizan Khan, Vishwakarma Institute of Information Technology
Devika Verma, Vishwakarma Institute of Information Technology
Rakesh Ranjan, San Jose State UniversityFollow

Publication Date

1-1-2022

Document Type

Conference Proceeding

Publication Title

Proceedings of International Conference on Computing, Communication, Security and Intelligent Systems, IC3SIS 2022

DOI

10.1109/IC3SIS54991.2022.9885653

Abstract

Data-driven models function admirably in solving real-world problems. However, obtaining relevant data is difficult. Also, sometimes more diverse data is needed to identify the limitations of trained Machine Learning models. Creating such data samples based on earlier known metadata is a common practice. However, this process can induce bias in the dataset unknowingly. Generative Adversarial Networks (GAN) based data generation models generate more data based on initial data distribution. Thus, data generation models may reflect bias in the generated synthetic data. In this study, the authors have proposed an interactive synthetic data generation Graphical User Interface (GUI) tool. The tool is equipped with Bias detection and mitigation algorithms which will notify users about the pre-existing bias and provide methods to mitigate it. Similarly, this tool can be used to evaluate synthetic data generated using GAN-based models against fairness metrics. The authors have found that Learning Fair Representation (LFR) bias mitigation method has performed 62% 17.5% better than Prejudice remover and Disparate impact remover for German Credit Adult original datasets. These results were concluded based on bias detection metrics such as Statistical Parity Difference (SPD) and Disparate Impact (DI). The proposed data generation tool used with LFR method can reduced SPD metric by 93% on original German Credit data. The authors conclude that both original and synthetic datasets had a bias. Therefore, the fairness level of any dataset should be checked vigilantly.

Keywords

Ethical Bias Mitigation, Ethical Data Generation, Ethical Fairness Detection, Generative Adversarial Network, Synthetic Data Generation

Department

Computer Engineering

Recommended Citation

Shubham Gujar, Tanishka Shah, Dewen Honawale, Vedant Bhosale, Faizan Khan, Devika Verma, and Rakesh Ranjan. "GenEthos: A Synthetic Data Generation System with Bias Detection and Mitigation" Proceedings of International Conference on Computing, Communication, Security and Intelligent Systems, IC3SIS 2022 (2022). https://doi.org/10.1109/IC3SIS54991.2022.9885653

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

GenEthos: A Synthetic Data Generation System with Bias Detection and Mitigation

Publication Date

Document Type

Publication Title

DOI

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

GenEthos: A Synthetic Data Generation System with Bias Detection and Mitigation

Authors

Publication Date

Document Type

Publication Title

DOI

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links