Author

Fei Pan

Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

William Andreopoulos

Second Advisor

Nada Attar

Third Advisor

Thomas Austin

Keywords

Hashtag Recommendation, Neural Collaborative Filtering (NCF), Matrix Factorization (MF), Content-based Filtering, Hybrid, BERT, Nearest Neighbor

Abstract

The purpose of this project is recommending relevant hashtags for users using both Collaborative Filtering (CF) and Content-based filtering with Twitter dataset. The Twitter dataset was collected by leveraging Twitter API v2. After data preprocessing, 40,806 tweets posted by 278 users with 3,107 hashtags from 01/01/2022 to 04/30/2022 are used for model training and testing. For CF models, we will mainly focus on generating embeddings to learn about user and hashtag latent factors and finally predict a probability for unseen hashtags with most possibility will be ranked as topK items for corresponding users. In this project, Matrix Factorization (MF), Neural Collaborative Filtering (NCF), Neural Matrix Factorization (NeuMF), NeuMF with Pre-train are trained and evaluated. Also, Content-based filtering utilizes BERT to get each hashtag embedding from tweet texts, then get the topK nearest neighbors by KDTree and Cosine similarities for each user. At the end, two hybrid approaches are come up with the first hybrid approach is to add item embedding to NeuMF with Pre-train model as an input. The second hybrid approach is to combine NCF and Content-based results together to boost the best performance with increasing mAP by 10.0%, nDCG by 7.7% compared with NeuMF model.

Available for download on Friday, May 23, 2025

Share

COinS