Publication Date
Spring 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Mark Stamp
Second Advisor
Fabio Di Troia
Third Advisor
Mike Wu
Keywords
Chatbot, Machine Learning.
Abstract
There have been many recent advances in the field of Generative Artificial Intelligence and Large Language Models, with GPT 3 or ChatGPT model being one of the frontrunners in this field. These large language models have become so powerful that it has become difficult to differentiate between text written by humans and machine-generated text. This paper proposes a solution to the problem of classification of the origin of data (human or chatbot) by using Machine Learning. In addition, the proposed solution also helps us analyze the text generated by such Language Models and understand the underlying patterns present in the text. We introduce two methodologies for tackling this issue: Feature Engineering and advanced Embedding. Feature Engineering involves extracting a series of features from the text for classification. Additionally, we also explore the utilization of contextual embeddings and transformer-based architectures to train models. Overall, our proposed solution offers a better understanding and classification of text data in the era of advanced AI technologies.
Recommended Citation
Godghase, Gauri Anil, "Distinguishing Chatbot from Human" (2024). Master's Projects. 1408.
DOI: https://doi.org/10.31979/etd.khbm-6rhq
https://scholarworks.sjsu.edu/etd_projects/1408