Publication Date

Fall 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Saptarshi Sengupta

Second Advisor

Gopal Nath

Third Advisor

Christopher J. Pollett

Keywords

Large Language Models, Retrieval Augmented Generation, Personalized Medicine, Survival Prediction, RNN, LSTM.

Abstract

Today, cancer is a major health risk to thousands of people, and there are over a two-hundred different types of cancer. Luckily, over the past several years, the outcomes and survival rates have increased, all thanks to machine learning, specifically Recurrent Neural Networks (RNN) and Long Short-Term memory (LSTM) networks. However, the current prognostic models don’t allow healthcare professionals to adapt the variables to mimic all the different features of every type of cancer, resulting in a model that works but is not as accurate as it could be. This study explores improving the accuracy and adaptability of the current cancer prognostic models by utilizing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) techniques. These models allow healthcare professionals to adapt the model to each patient’s unique case, resulting in survival predictions that can reach beyond the typical 5-year mark and the use of attributes that couldn’t be used in the model before like tumor grade. Preliminary results show that the use of Large Language Models and Retrieval Augmented Generation techniques improved prediction accuracy by approximately 82%, with survival prediction rates for patients beyond 5 years increasing by 15% compared to traditional models like Logistic Regression (LR). The LLM also improves predictions for site recode and cancer grade. Using these new models for cancer prognostics could lead to better, more personalized treatment decisions, resulting in better outcomes for cancer patients.

Available for download on Monday, December 15, 2025

Share

COinS