Publication Date
4-1-2026
Document Type
Article
Publication Title
Software Impacts
Volume
27
DOI
10.1016/j.simpa.2026.100811
Abstract
This paper presents Rosetta-XAI, a comprehensive software framework for evaluating and explaining Large Language Model (LLM) behavior in cross-language code conversion tasks. The system implements a four-stage automated pipeline: (1) code generation by LLMs accessed through the Ollama API inference service, (2) regex-based extraction of code blocks from markdown responses, (3) language-specific syntax and compilation validation with temporary artifact management, and (4) execution with timeout protections and CSV-based checkpoint recovery. The framework supports evaluation of 15 specialized code LLMs (1.3B–34B parameters), including DeepSeek Coder, Code Llama, CodeGemma, and Granite Code across 17 Rosetta Code programming tasks, generating 42 bidirectional conversion pairs among seven languages (C, C++, Go, Java, JavaScript, Python, Rust). Beyond traditional pass@1 accuracy metrics, the system incorporates explainability analysis through Shapley Value Sampling and Feature Ablation techniques implemented via Captum and PyTorch, enabling researchers to quantify token-level feature importance during translation. All pipeline components include XAI-enhanced variants supporting follow-up question analysis for interpretability studies. Built using Python with pandas for metrics aggregation and subprocess management for multi-language execution, the modular architecture separates extraction, validation, and execution concerns. Results are systematically organized into structured directories tracking accepted code, compilation failures, syntax errors, and execution outputs, with comprehensive metrics exported to CSVs for reproducible research and comparative model analysis.
Funding Number
23-RSG-07-077
Keywords
Code translation, Explainable artificial intelligence, Feature ablation, Large language models, Model interpretability, Shapley values
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Applied Data Science
Recommended Citation
Vishnu S. Pendyala and Neha Bais Thakur. "Rosetta-Xai: An Automated Evaluation and Explainability Framework for Code Translation Models" Software Impacts (2026). https://doi.org/10.1016/j.simpa.2026.100811