Author

Brian Tran

Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

William Andreopoulos

Second Advisor

Thomas Austin

Third Advisor

Katerina Potika

Keywords

Biomedical NLP, Relation Extraction, Named Entity Recognition, Knowledge Graph, Large Language Models, PubMed

Abstract

Rapid release in biomedical literature poses a challenge in linking information. This thesis aims to extract data from expanding datasets to identify and form meaningful relationships between biomedical entities. Large language models (LLMs) enable us to learn at a rapid pace. Creation of LLms from scratch are impractical. This thesis aims to collect a small dataset, containing biomedical papers, and use it to train large language models (LLMs) to extract entities from the text and learn the relationships between these entities. The experiment will be divided into two stages and utilize EU-ADR and ChemProt dataset. Starting with named entity recognition (NER), cleaned datasets will be inserted through four LLMs. To determine the best results, data will be inserted through training relation extraction (RE) models, followed by a display of results graphs and visualization.

Available for download on Monday, May 25, 2026

Share

COinS