Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
William Andreopoulos
Second Advisor
Genya Ishigaki
Third Advisor
Thomas Austin
Keywords
Retrieval-Augmented Generation, Knowledge Graph, MultiHop- RAG, Vector Database, Summarization, Semantic Chunking, Question Answering, Large Language Models (LLMs).
Abstract
With the vast amount of information available on the internet distributed across several lengthy documents, finding relevant information has become more important and challenging. The goal of this project is to develop advanced techniques to retrieve information from long texts in order to deliver accurate and relevant results while ensuring speed and efficiency. As part of this work, we employ techniques to address unique difficulties posed by large and complex documents. This paper presents a custom Retrieval-Augmented Generation (RAG) framework designed to improve contextual retrieval in long and multi-document settings. In this paper, we employ several techniques like summarization, semantic chunking, vectorization and knowledge graph construction to enhance query understanding and reasoning. We use the MultiHop-RAG dataset to evaluate multi-hop retrieval and question-solving scenarios where the evidence for a query is distributed across multiple documents.
Recommended Citation
Garg, Sakshi, "Improving Contextual Retrieval for Long Documents in Q & A systems" (2025). Master's Projects. 1546.
DOI: https://doi.org/10.31979/etd.x2fw-ezpz
https://scholarworks.sjsu.edu/etd_projects/1546