Publication Date
Spring 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
William Andreopoulos
Second Advisor
Nada Attar
Third Advisor
Thomas Austin
Keywords
Hashtag Recommendation, Neural Collaborative Filtering (NCF), Matrix Factorization (MF), Content-based Filtering, Hybrid, BERT, Nearest Neighbor
Abstract
The purpose of this project is recommending relevant hashtags for users using both Collaborative Filtering (CF) and Content-based filtering with Twitter dataset. The Twitter dataset was collected by leveraging Twitter API v2. After data preprocessing, 40,806 tweets posted by 278 users with 3,107 hashtags from 01/01/2022 to 04/30/2022 are used for model training and testing. For CF models, we will mainly focus on generating embeddings to learn about user and hashtag latent factors and finally predict a probability for unseen hashtags with most possibility will be ranked as topK items for corresponding users. In this project, Matrix Factorization (MF), Neural Collaborative Filtering (NCF), Neural Matrix Factorization (NeuMF), NeuMF with Pre-train are trained and evaluated. Also, Content-based filtering utilizes BERT to get each hashtag embedding from tweet texts, then get the topK nearest neighbors by KDTree and Cosine similarities for each user. At the end, two hybrid approaches are come up with the first hybrid approach is to add item embedding to NeuMF with Pre-train model as an input. The second hybrid approach is to combine NCF and Content-based results together to boost the best performance with increasing mAP by 10.0%, nDCG by 7.7% compared with NeuMF model.
Recommended Citation
Pan, Fei, "Ranking-based Hashtag Recommendation with Collaborative and Content-based Filtering" (2024). Master's Projects. 1380.
DOI: https://doi.org/10.31979/etd.vusp-hf62
https://scholarworks.sjsu.edu/etd_projects/1380