Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Ching Seh Mike Wu
Second Advisor
Mark Stamp
Third Advisor
Jorjeta Jetcheva
Keywords
AI Text Detection, Ensemble Classification, Mixed- Content Detection, LLM Detection Performance, Multi-Model Approach
Abstract
The widespread adoption of Large Language Models (LLMs) has revolutionized text generation and heightened concerns over misinformation and the erosion of journalistic integrity. Detecting AI-generated text is critical to addressing these challenges, yet current detection methods face adaptability, scalability, and accuracy limitations. This research paper uses machine-learning techniques to explore the classification of human and AI-generated articles, including a mix of human and AI-written content. The primary focus is on evaluating the effectiveness of clustering algorithms (K-Means and Agglomerative Clustering), auto-encoders, and Part-Of- Speech Tag Transition Matrix Log-Likelihood for distinguishing between AI-generated and human-written texts. Our findings reveal that while models perform well on fully AI or human-written texts, mixed content introduces significant challenges for all classifiers. The rapid evolution of Large Language Models has produced AI-generated content that increasingly mirrors human writing patterns, linguistic nuances, and stylistic variations. This convergence between human and machineauthored text presents a fundamental challenge for detection mechanisms, particularly in hybrid content where AI contributions are seamlessly integrated with human writing. However, our ensemble approach combining multiple classifier models achieves comparable performance to Ghostbuster (96.5% vs 92.9%) overall accuracy for the Gemini Improved model), demonstrating particular strength in AI text detection, where Ghostbuster shows relative weakness.
Recommended Citation
Sicard-Noel, Lilou, "DETECTING AI-GENERATED NEWS ARTICLES USING UNSUPERVISED MACHINE LEARNING ALGORITHMS" (2025). Master's Projects. 1513.
DOI: https://doi.org/10.31979/etd.kjnb-57br
https://scholarworks.sjsu.edu/etd_projects/1513