The Power of Patterns in Detecting News Articles Written by AI
Abstract
In the age of advanced AI text generation, distinguishing between human and machine-generated content has become increasingly challenging yet crucial. This research addresses this problem in news media, where maintaining information integrity is essential. We implement a novel approach for detecting AI-generated news articles by transforming text into two types of vector representations: semantic embeddings using OpenAI's embedding model and syntactic patterns through Part-of-Speech (POS) tagging. These vectors are then analyzed using unsupervised machine learning models, including K-Means, Hierarchical Clustering, and Gaussian Mixture Models (GMM). Our results demonstrate that GMM achieves exceptional performance with 99.6% accuracy, significantly outperforming existing resource-intensive detection methods like Ghostbuster. This approach is significantly less expensive and offers superior detection capabilities. Future work will involve testing with other large language models such as LLaMA and Gemini, and exploring advanced feature engineering techniques to further improve detection as AI text generation evolves.