Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Navrati Saxena

Second Advisor

Robert Chun

Third Advisor

Adil Mohammad Ansari

Keywords

Retrieval Augmented Generation, Generative AI, Chatbot, Vector Database, Claude, GPT, Anthropic, DeepSeek, Llama

Abstract

The use of Retrieval-augmented generation (RAG) in chatbot platforms has transformed academic spaces by significantly improving information accessibility. RAG has become a viable approach to upgrading Large Language Models (LLMs) with external knowledge access in real time. With the growing availability of advanced LLMs such as GPT, DeepSeek, Claude, Gemini, and Llama, there is a growing need to compare RAG systems based on different LLMs. This study compares the responses of four different RAG chatbots using popular LLMs against a uniquely designed evaluation dataset. Specifically, the study compares the responses and performance of closed-source (GPT-4o and Claude) and open-source models (DeepSeek and Llama) against questions requiring inference from multiple scientific corpora with intricate content and structure. All RAG models in this research use a Chroma vector database to store embeddings. The retrieved documents and the query are provided as input prompts to the LLMs, thus allowing contextually grounded response construction. Each chatbot is evaluated based on ten complex research papers from various domains in computer science. The specifically designed evaluation dataset contains 75 questions derived from these research papers, with a wide variety of questions ranging from simple yes/no questions to questions requiring an understanding of multiple papers. The responses provided by each chatbot are measured quantitatively using standard measures, including Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), and Bidirectional Encoder Representations from Transformers (BERT) scores, to evaluate response quality comprehensively.

Available for download on Sunday, May 17, 2026

Share

COinS