Faculty Research, Scholarly, and Creative Activity

AI-Generated Text Detection and Source Identification

Anjana Priyatham Tatavarthi, San Jose State University
Faranak Abri, San Jose State UniversityFollow
Nada Attar, San Jose State UniversityFollow

Publication Date

1-1-2025

Document Type

Article

Publication Title

Journal of Advances in Information Technology

Volume

Issue

DOI

10.12720/jait.16.7.1030-1041

First Page

1030

Last Page

1041

Abstract

The use of advanced machine learning techniques to detect AI-generated text is a very practical application. The ability to distinguish human-written content from machine-generated text while identifying the source generative model helps address growing concerns about authenticity and accountability in digital communication. The differentiation of human-generated and AI-generated text is highly relevant to several applications, from news media to academic integrity, and is key to ensuring transparency and trust in content-driven environments. However, existing models are often insufficient to accurately detect AI-generated text and determine the specific AI source due to the complex nature of machine-generated content. To address this, it is essential to leverage state-of-the-art machine learning models and embedding techniques that can capture the subtle linguistic and contextual patterns of AI-generated text. In this study, experiments involving text classification were conducted to develop models capable of distinguishing AI-generated content from human-written text and identifying the specific AI model used, offering a multilayered approach to detection. The results demonstrate that the Long Short-Term Memory (LSTM) model with Bidirectional Encoder Representations from Transformers (BERT) embeddings outperformed other embedding techniques at the task of binary classification, achieving a score of 97% for both accuracy and F1 metrics. Additionally, this study illustrates the superior performance of pretrained transformer-based models compared to Recurrent Neural Network (RNN)-based models for four-class source identification, with Robustly optimized BERT approach (RoBERTa) achieving a score of 88% for both accuracy and F1 metrics. This highlights the advantage of leveraging powerful Large Language Models (LLMs) for the complex task of source identification, offering a more robust and scalable solution compared to traditional approaches.

Funding Number

2319803

Funding Sponsor

National Science Foundation

Keywords

AI-generated text, Bidirectional Encoder Representations from Transformers (BERT) model, Bidirectional Long Short-Term Memory (BiLSTM), Deep Learning (DL), Large Language Models (LLMs), Long Short-Term Memory (LSTM), Machine Learning (ML), Robustly optimized BERT approach (RoBERTa) model, word embeddings

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Computer Science

Recommended Citation

Anjana Priyatham Tatavarthi, Faranak Abri, and Nada Attar. "AI-Generated Text Detection and Source Identification" Journal of Advances in Information Technology (2025): 1030-1041. https://doi.org/10.12720/jait.16.7.1030-1041

Download

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

AI-Generated Text Detection and Source Identification

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Funding Number

Funding Sponsor

Keywords

Creative Commons License

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

AI-Generated Text Detection and Source Identification

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Funding Number

Funding Sponsor

Keywords

Creative Commons License

Department

Recommended Citation

Share

Search

Browse All

Links