Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Mark Stamp

Second Advisor

Fabio Di Troia

Third Advisor

Mike Wu

Keywords

Chatbot, Machine Learning.

Abstract

There have been many recent advances in the field of Generative Artificial Intelligence and Large Language Models, with GPT 3 or ChatGPT model being one of the frontrunners in this field. These large language models have become so powerful that it has become difficult to differentiate between text written by humans and machine-generated text. This paper proposes a solution to the problem of classification of the origin of data (human or chatbot) by using Machine Learning. In addition, the proposed solution also helps us analyze the text generated by such Language Models and understand the underlying patterns present in the text. We introduce two methodologies for tackling this issue: Feature Engineering and advanced Embedding. Feature Engineering involves extracting a series of features from the text for classification. Additionally, we also explore the utilization of contextual embeddings and transformer-based architectures to train models. Overall, our proposed solution offers a better understanding and classification of text data in the era of advanced AI technologies.

Available for download on Sunday, May 25, 2025

Share

COinS