Publication Date
Spring 2024
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Engineering
Advisor
Magdalini Eirinaki; Katerina Potika; Stas Tiomkin
Abstract
A notorious challenge for recommender systems on online platforms is to accurately and fairly recommend items that align with users’ preferences while retaining user privacy. These systems often rely on historical data that is partially labeled and includes incomplete user information, leading to the over-representation of majority groups and disproportionately favoring popular items. Substituting user data with synthetic data can address these concerns, but accurately replicating real-world datasets has been a challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across several domains. In this work, we introduce two variations of using diffusion models to capture the intricate patterns of real-world datasets for training accurate, fair, and privacy-preserving recommender systems: the Score-based Diffusion Recommendation Module (SDRM), which increases the accuracy of recommender systems while preserving privacy, and a Fair Diffusion Recommender Module (FairDiff), which synthesizes implicit feedback datasets to improve group fairness. Our methods outperform competing baselines, such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models, in synthesizing several datasets to replace or augment the original data across various fairness and top-k metrics.
Recommended Citation
Lilienthal, Derek B., "Synthetic Data Generation for Accurate, Fair, and Private Recommender Systems" (2024). Master's Theses. 5515.
DOI: https://doi.org/10.31979/etd.ggt3-adtq
https://scholarworks.sjsu.edu/etd_theses/5515