Publication Date

Fall 2025

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Mahima Agumbe Suresh; Magdalini Eirinaki; Wencen Wu

Abstract

Federated Learning (FL) offers a privacy-preserving alternative to centralized model training in healthcare but can amplify fairness risks under non-independent and identically distributed (non-IID) data and uneven client participation. This study evaluates bias and fairness in FL using three tabular clinical datasets, UCI Heart Disease, Diabetes and Obesity dataset, under cross-silo simulations with 3, 5, and 10 clients. Three models, Logistic Regression (LR) with stochastic gradient descent (SGD, log loss), linear Support Vector Machine (SVM) with SGD (hinge loss), and Na¨ıve Bayes (NB), are trained with Federated Averaging (FedAvg) in the Flower framework. A uniform preprocessing pipeline and feature-alignment mechanism ensure consistent aggregation by padding missing one-hot categories. Fairness is evaluated on a server-held test set using Fairlearn metrics: Demographic Parity Difference (DPD), Demographic Parity Ratio (DPR), Equalized Odds Difference (EOD), Equal Opportunity Difference (EOpD). An inferential framework based on the two-proportion z-test assesses the statistical significance of group accuracy gaps. Results show measurable disparities between male and female subgroups, though most are not statistically significant at p = 0.05. The only consistent hypothesis rejections occur for the Na¨ıve Bayes model on the Obesity dataset across both federated and centralized configurations, indicating model-specific fairness degradation under heterogeneous data. Key contributions include a reproducible FL pipeline for fairness auditing, a feature-alignment mechanism for stable aggregation, and an inferential framework for fairness validation. Findings highlight the sensitivity of fairness to client partitioning and model selection, motivating future research on lightweight fairness diagnostics and larger, better-balanced datasets for intersectional analysis.

Available for download on Sunday, August 15, 2027

Share

COinS