Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.
Publication Date
Fall 2025
Degree Type
Thesis - Campus Access Only
Degree Name
Master of Science (MS)
Department
Applied Data Science
Advisor
Mohammad Masum; Saptarshi Sengupta; Vishnu Pendyala
Abstract
Large Language Models (LLMs) are increasingly used for analysing structured data, but most current methods such as translating questions to SQL or using full-table natural language prompts still face several practical issues. SQL-based systems tend to be rigid and error-prone when schemas vary, while full-table prompts are inefficient and often fail when the model struggles with large or complex data. In this work, a new framework called Complexity-Aware Prompting (CAP) is proposed to make natural language–to–code generation more reliable for tabular data. The idea is to break down each user question into fifteen basic Pandas operations and use those to retrieve examples that are similar not only in meaning but also in the kind of computation they require. These examples are then combined with reasoning-based prompting methods such as Chain-of-Thought and Tree-of-Thought to help the model think through the task before generating code. When tested on the DataBench benchmark, the proposed framework reached 92.00% executable accuracy using Claude-3.7-Sonnet, which is ~22% better than instruction-only prompts and ~15% higher than existing retrieval-based methods. It also reduced schema-related errors by around one-third. Overall, this approach provides a practical way to make LLMs more consistent and trustworthy for everyday data analysis tasks. Keywords: natural language to code, tabular question answering, large language models, complexity-aware prompting, complexity alignment, executable accuracy
Recommended Citation
Kanchugantla, Yashasvi, "Natural Language Question Answering on Tabular Data Using Large Language Models" (2025). Master's Theses. 5749.
DOI: https://doi.org/10.31979/etd.z9uq-nygw
https://scholarworks.sjsu.edu/etd_theses/5749