Publication Date
Fall 2023
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Ching-seh Wu
Second Advisor
Chris Tseng
Third Advisor
Nada Attar
Keywords
Machine learning, topic modeling, Latent Dirichlet Allocation, recommender systems, collaborative filtering
Abstract
Emails are a fundamental part of modern communication. Much of communicative discourse in modern society occurs over email, resulting in personal collections for each mail user which are rich in latent user’s interests. Conventional recommendation systems require historical data of user activity and interactions to derive user interests. The absence of activity and interaction data poses an interesting challenge for generating relevant recommendations for users. We were motivated to investigate approaches to identify user interests in the absence of historical data to generate personalized content recommendations. There is opportunity to derive user interests from email data, which can be used by mail platforms with integrated content delivery services such as Gmail and Google News. These interests can compensate for the absence of historical data and can improve recommendation content relevance across integrated platforms and services. This research project explores the use of topic modeling techniques including different probabilistic generative models, transformers, and clustering to extract interests for users in an email dataset. After interest extraction, we generate ratings which are fed to a collaborative filtering recommendation system, to generate personalized news article recommendations for users based on their identified interests. The result of this research project demonstrates the effective use of topic modeling based recommendation using Hierarchical Dirichlet Process, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation and BERT transformers, with Latent Dirichlet Allocation standing out with a topic coherence of 61% and demonstrating high scalability. Our experiments contribute to the development of more effective personalized content delivery systems that can better cater to users' interests, even in the absence of explicit user interest historical data.
Recommended Citation
Ghaskadbi, Pranav, "INTEREST-BASED RECOMMENDATION SYSTEM USING GMAIL TOPIC MODELLING" (2023). Master's Projects. 1323.
DOI: https://doi.org/10.31979/etd.gwg4-pb8d
https://scholarworks.sjsu.edu/etd_projects/1323