Publication Date

Fall 2023

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)


Computer Science

First Advisor

Ching-seh Wu

Second Advisor

Chris Tseng

Third Advisor

Nada Attar


Machine learning, topic modeling, Latent Dirichlet Allocation, recommender systems, collaborative filtering


Emails are a fundamental part of modern communication. Much of communicative discourse in modern society occurs over email, resulting in personal collections for each mail user which are rich in latent user’s interests. Conventional recommendation systems require historical data of user activity and interactions to derive user interests. The absence of activity and interaction data poses an interesting challenge for generating relevant recommendations for users. We were motivated to investigate approaches to identify user interests in the absence of historical data to generate personalized content recommendations. There is opportunity to derive user interests from email data, which can be used by mail platforms with integrated content delivery services such as Gmail and Google News. These interests can compensate for the absence of historical data and can improve recommendation content relevance across integrated platforms and services. This research project explores the use of topic modeling techniques including different probabilistic generative models, transformers, and clustering to extract interests for users in an email dataset. After interest extraction, we generate ratings which are fed to a collaborative filtering recommendation system, to generate personalized news article recommendations for users based on their identified interests. The result of this research project demonstrates the effective use of topic modeling based recommendation using Hierarchical Dirichlet Process, Probabilistic Latent Semantic Analysis, Latent Dirichlet Allocation and BERT transformers, with Latent Dirichlet Allocation standing out with a topic coherence of 61% and demonstrating high scalability. Our experiments contribute to the development of more effective personalized content delivery systems that can better cater to users' interests, even in the absence of explicit user interest historical data.

Available for download on Friday, December 20, 2024