Publication Date

4-12-2024

Document Type

Article

Publication Title

IEEE Access

Volume

12

DOI

10.1109/ACCESS.2024.3388299

First Page

58275

Last Page

58287

Abstract

While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Module (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@ k and 4.65% in NDCG@ k .

Keywords

Training, Recommender systems, Data models, Synthetic data, Data privacy, Noise reduction, Gaussian distribution, Diffusion processes, Machine learning

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Department

Computer Engineering

Share

COinS