Master's Projects

Retrieval-Augmented Generation (RAG) Chatbots: A Comparative Study of Claude, GPT-4o, DeepSeek, and Llama

Kalindi Vijesh Parekh

Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Navrati Saxena

Second Advisor

Robert Chun

Third Advisor

Adil Mohammad Ansari

Keywords

Retrieval Augmented Generation, Generative AI, Chatbot, Vector Database, Claude, GPT, Anthropic, DeepSeek, Llama

Abstract

The use of Retrieval-augmented generation (RAG) in chatbot platforms has transformed academic spaces by significantly improving information accessibility. RAG has become a viable approach to upgrading Large Language Models (LLMs) with external knowledge access in real time. With the growing availability of advanced LLMs such as GPT, DeepSeek, Claude, Gemini, and Llama, there is a growing need to compare RAG systems based on different LLMs. This study compares the responses of four different RAG chatbots using popular LLMs against a uniquely designed evaluation dataset. Specifically, the study compares the responses and performance of closed-source (GPT-4o and Claude) and open-source models (DeepSeek and Llama) against questions requiring inference from multiple scientific corpora with intricate content and structure. All RAG models in this research use a Chroma vector database to store embeddings. The retrieved documents and the query are provided as input prompts to the LLMs, thus allowing contextually grounded response construction. Each chatbot is evaluated based on ten complex research papers from various domains in computer science. The specifically designed evaluation dataset contains 75 questions derived from these research papers, with a wide variety of questions ranging from simple yes/no questions to questions requiring an understanding of multiple papers. The responses provided by each chatbot are measured quantitatively using standard measures, including Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), and Bidirectional Encoder Representations from Transformers (BERT) scores, to evaluate response quality comprehensively.

Recommended Citation

Parekh, Kalindi Vijesh, "Retrieval-Augmented Generation (RAG) Chatbots: A Comparative Study of Claude, GPT-4o, DeepSeek, and Llama" (2025). Master's Projects. 1469.
DOI: https://doi.org/10.31979/etd.h5tv-9uzy
https://scholarworks.sjsu.edu/etd_projects/1469

Download

Available for download on Sunday, May 17, 2026

Included in

Other Computer Engineering Commons

COinS

DOI

https://doi.org/10.31979/etd.h5tv-9uzy

Master's Projects

Retrieval-Augmented Generation (RAG) Chatbots: A Comparative Study of Claude, GPT-4o, DeepSeek, and Llama

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Retrieval-Augmented Generation (RAG) Chatbots: A Comparative Study of Claude, GPT-4o, DeepSeek, and Llama

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links