Unraveling the Enigma of Classification of Synthetic and Genuine Information Using Machine Learning and Explainable AI
Publication Date
1-1-2025
Document Type
Conference Proceeding
Publication Title
Communications in Computer and Information Science
Volume
2434 CCIS
DOI
10.1007/978-3-031-84602-1_16
First Page
229
Last Page
242
Abstract
There is a preponderance of AI-generated text everywhere today. Literature shows that there has been considerable success in detecting such text. This paper uses explainable AI (XAI) techniques to get insights into the workings of the machine learning models used to classify synthetic text. In detecting such text, this work analyzes synthetic and genuine information from visualization and explainability perspectives. The text is converted into vector embeddings using Robustly Optimized BERT Pretraining Approach (RoBERTa). Variational Autoencoders (VAEs) are used for visualization and Support Vector Machine (SVM) is used for classification. Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Integrated Gradients are used to explain the classification. The experiments are done on two different types of datasets. Despite the machine learning model achieving outstanding accuracy similar to the previous work in the literature, it was determined that there is no clear explanation of why the representation learning or the classification works so outstandingly well. The explainability techniques used show that the model focuses on words that do not clearly indicate that the text is synthetically generated. Visualization in two dimensions shows that the vector embeddings of both classes of text overlap significantly and that there is no clear separation of the representations learned.
Keywords
Artificial Intelligence, Explainable AI, Intelligent systems, Knowledge representation, Natural Language Processing
Department
Applied Data Science
Recommended Citation
Vishnu S. Pendyala. "Unraveling the Enigma of Classification of Synthetic and Genuine Information Using Machine Learning and Explainable AI" Communications in Computer and Information Science (2025): 229-242. https://doi.org/10.1007/978-3-031-84602-1_16