Publication Date
1-1-2025
Document Type
Article
Publication Title
Journal of Advances in Information Technology
Volume
16
Issue
7
DOI
10.12720/jait.16.7.1030-1041
First Page
1030
Last Page
1041
Abstract
The use of advanced machine learning techniques to detect AI-generated text is a very practical application. The ability to distinguish human-written content from machine-generated text while identifying the source generative model helps address growing concerns about authenticity and accountability in digital communication. The differentiation of human-generated and AI-generated text is highly relevant to several applications, from news media to academic integrity, and is key to ensuring transparency and trust in content-driven environments. However, existing models are often insufficient to accurately detect AI-generated text and determine the specific AI source due to the complex nature of machine-generated content. To address this, it is essential to leverage state-of-the-art machine learning models and embedding techniques that can capture the subtle linguistic and contextual patterns of AI-generated text. In this study, experiments involving text classification were conducted to develop models capable of distinguishing AI-generated content from human-written text and identifying the specific AI model used, offering a multilayered approach to detection. The results demonstrate that the Long Short-Term Memory (LSTM) model with Bidirectional Encoder Representations from Transformers (BERT) embeddings outperformed other embedding techniques at the task of binary classification, achieving a score of 97% for both accuracy and F1 metrics. Additionally, this study illustrates the superior performance of pretrained transformer-based models compared to Recurrent Neural Network (RNN)-based models for four-class source identification, with Robustly optimized BERT approach (RoBERTa) achieving a score of 88% for both accuracy and F1 metrics. This highlights the advantage of leveraging powerful Large Language Models (LLMs) for the complex task of source identification, offering a more robust and scalable solution compared to traditional approaches.
Funding Number
2319803
Funding Sponsor
National Science Foundation
Keywords
AI-generated text, Bidirectional Encoder Representations from Transformers (BERT) model, Bidirectional Long Short-Term Memory (BiLSTM), Deep Learning (DL), Large Language Models (LLMs), Long Short-Term Memory (LSTM), Machine Learning (ML), Robustly optimized BERT approach (RoBERTa) model, word embeddings
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Computer Science
Recommended Citation
Anjana Priyatham Tatavarthi, Faranak Abri, and Nada Attar. "AI-Generated Text Detection and Source Identification" Journal of Advances in Information Technology (2025): 1030-1041. https://doi.org/10.12720/jait.16.7.1030-1041