Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Ching-Seh Wu

Second Advisor

Chris Pollett

Third Advisor

Robert Chun

Keywords

Natural Language Processing, multi-label text classification, deep learning, data augmentation, synonym replacement, random word substitution, pre-trained models, transfer learning

Abstract

Multi-label text categorization is a crucial task in Natural Language Processing, where each text instance can be simultaneously assigned to numerous labels. This project's goal is to assess how well several deep learning models perform on a real-world dataset for multi-label text classification. We employed data augmentation techniques like Synonym Substitution and Random Word Substitution to address the problem of data imbalance. We conducted experiments on a toxic comment classification dataset to evaluate the effectiveness of several deep learning models including Bi-LSTM, GRU, and Bi-GRU, as well as fine- tuned pre-trained BERT models. Many metrics, including log loss, recall@k, and hamming loss were used to evaluate the performance of models. Bi-GRU and BERT models with data augmentation techniques outperformed other models in terms of recall@k, and micro-F1 metrics, according to our experimental findings. We also discovered that models performed better when data augmentation approaches were used. Our study shows that pre-trained BERT models are effective for multi-label text classification, with good performance across various metrics. The results of this study provide insights into the effectiveness of different deep learning architectures and data augmentation techniques for multi-label text classification tasks. This study also highlights the importance of addressing data imbalance in multi-label text classification and the potential benefits of using pre-trained language models for this task.

Recommended Citation

Yelamanchili, Likhitha, "Multi-Label Text Classification with Transfer Learning" (2023). Master's Projects. 1237.
DOI: https://doi.org/10.31979/etd.8s6m-9ch7
https://scholarworks.sjsu.edu/etd_projects/1237

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

DOI

https://doi.org/10.31979/etd.8s6m-9ch7

Master's Projects

Multi-Label Text Classification with Transfer Learning

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Multi-Label Text Classification with Transfer Learning

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links