Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Ching-Seh Wu
Second Advisor
Fabio Di Troia
Third Advisor
Genya Ishigaki
Keywords
Large Language Models, Retrieval Augmented Generation, Knowledge Base, Differential Diagnosis, Chain-of-Thought
Abstract
Although recent trends indicate that LLMs outperform traditional methods in solving complex problems with enhanced reasoning, there has been barely any progress in replicating the quality of diagnoses like those of actual human doctors. The identification of an accurate diagnosis with thorough reasoning is still a significant challenge, even with advanced AI models. The process of performing accurate diagnosis remains challenging due to a lack of transparency in state-of-the-art models existing today, a lack of explanation in the diagnosis process, an emphasis on results rather than reasoning, and a lack of foundational knowledge in models, along with limited exploration of diseases. To solve these problems, we propose a RAG model with smart prompt engineering to develop a sound medical diagnostic agent. The rapidly evolving techniques in LLMs, particularly RAG models, have shown promising results in processing and interpreting complex data. With the use of smart prompt engineering techniques and the use of a RAG framework with strong databases, we have achieved desirable results and enhanced the diagnostic reasoning performance. Deepseek-R1-Distill-Qwen-7B, Mixtral-8x7B, and MedAlpaca were trained as RAG models with PubMed articles and PMC patient data by varying the number of documents. We observe an average increase of 15% in accuracy scores when we introduce relevant documents in the RAG framework. Additionally, prompt engineering guides the formulation of differential diagnosis and a chain-of-thought inference. This study is the first of its kind to identify the variation in model performance by varying the number of documents in the RAG framework.
Recommended Citation
Syed, Qadeerullah, "DISEASE DIAGNOSIS USING RAG LLM WITH SMART PROMPT ENGINEERING" (2025). Master's Projects. 1514.
DOI: https://doi.org/10.31979/etd.azur-9jbv
https://scholarworks.sjsu.edu/etd_projects/1514