Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Ching Seh Mike Wu

Second Advisor

Mark Stamp

Third Advisor

Jorjeta Jetcheva

Keywords

AI Text Detection, Ensemble Classification, Mixed- Content Detection, LLM Detection Performance, Multi-Model Approach

Abstract

The widespread adoption of Large Language Models (LLMs) has revolutionized text generation and heightened concerns over misinformation and the erosion of journalistic integrity. Detecting AI-generated text is critical to addressing these challenges, yet current detection methods face adaptability, scalability, and accuracy limitations. This research paper uses machine-learning techniques to explore the classification of human and AI-generated articles, including a mix of human and AI-written content. The primary focus is on evaluating the effectiveness of clustering algorithms (K-Means and Agglomerative Clustering), auto-encoders, and Part-Of- Speech Tag Transition Matrix Log-Likelihood for distinguishing between AI-generated and human-written texts. Our findings reveal that while models perform well on fully AI or human-written texts, mixed content introduces significant challenges for all classifiers. The rapid evolution of Large Language Models has produced AI-generated content that increasingly mirrors human writing patterns, linguistic nuances, and stylistic variations. This convergence between human and machineauthored text presents a fundamental challenge for detection mechanisms, particularly in hybrid content where AI contributions are seamlessly integrated with human writing. However, our ensemble approach combining multiple classifier models achieves comparable performance to Ghostbuster (96.5% vs 92.9%) overall accuracy for the Gemini Improved model), demonstrating particular strength in AI text detection, where Ghostbuster shows relative weakness.

Available for download on Monday, May 25, 2026

Share

COinS