Publication Date
Spring 2023
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Ching-Seh Wu
Second Advisor
Chris Pollett
Third Advisor
Robert Chun
Keywords
Natural Language Processing, multi-label text classification, deep learning, data augmentation, synonym replacement, random word substitution, pre-trained models, transfer learning
Abstract
Multi-label text categorization is a crucial task in Natural Language Processing, where each text instance can be simultaneously assigned to numerous labels. This project's goal is to assess how well several deep learning models perform on a real-world dataset for multi-label text classification. We employed data augmentation techniques like Synonym Substitution and Random Word Substitution to address the problem of data imbalance. We conducted experiments on a toxic comment classification dataset to evaluate the effectiveness of several deep learning models including Bi-LSTM, GRU, and Bi-GRU, as well as fine- tuned pre-trained BERT models. Many metrics, including log loss, recall@k, and hamming loss were used to evaluate the performance of models. Bi-GRU and BERT models with data augmentation techniques outperformed other models in terms of recall@k, and micro-F1 metrics, according to our experimental findings. We also discovered that models performed better when data augmentation approaches were used. Our study shows that pre-trained BERT models are effective for multi-label text classification, with good performance across various metrics. The results of this study provide insights into the effectiveness of different deep learning architectures and data augmentation techniques for multi-label text classification tasks. This study also highlights the importance of addressing data imbalance in multi-label text classification and the potential benefits of using pre-trained language models for this task.
Recommended Citation
Yelamanchili, Likhitha, "Multi-Label Text Classification with Transfer Learning" (2023). Master's Projects. 1237.
DOI: https://doi.org/10.31979/etd.8s6m-9ch7
https://scholarworks.sjsu.edu/etd_projects/1237