Faculty Research, Scholarly, and Creative Activity

Understanding Tradeoffs in Clinical Text Extraction: Prompting, Retrieval-Augmented Generation, and Supervised Learning on Electronic Health Records

Publication Date

3-13-2026

Document Type

Article

Publication Title

Algorithms

Volume

Issue

DOI

10.3390/a19030215

Abstract

Clinical discharge summaries contain rich patient information but remain difficult to convert into structured representations for downstream analysis. Recent advances in large language models (LLMs) have introduced new approaches for clinical text extraction, yet their relative strengths compared with supervised methods remain unclear. This study presents a controlled evaluation of three dominant strategies for structured clinical information extraction from electronic health records: prompting-based extraction using LLMs, retrieval-augmented generation for terminology canonicalization, and supervised fine-tuning of domain-specific transformer models. Using discharge summaries from the MIMIC-IV dataset, we compare zero-shot, few-shot, and verification-based prompting across closed-source and open-source LLMs, evaluate retrieval-augmented canonicalization as a post-processing mechanism, and benchmark these methods against a fine-tuned BioClinicalBERT model. Performance is assessed using a multi-level evaluation framework that combines exact matching, fuzzy lexical matching, and semantic assessment via an LLM-based judge. The results reveal clear tradeoffs across approaches: prompting achieves strong semantic correctness with minimal supervision, retrieval augmentation improves terminology consistency without expanding extraction coverage, and supervised fine-tuning yields the highest overall accuracy when labeled data are available. Across all methods, we observe a consistent (Formula presented.) gap between exact-match and semantic correctness, highlighting the limitations of string-based metrics for clinical Natural Language Processing (NLP). These findings provide practical guidance for selecting extraction strategies under varying resource constraints and emphasize the importance of evaluation methodologies that reflect clinical equivalence rather than surface-form similarity.

Keywords

BioClinicalBERT, clinical information extraction, electronic health records, large language models, MIMIC-IV, retrieval-augmented generation, semantic evaluation

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Applied Data Science

Recommended Citation

Tanya Yadav, Aditya Tekale, Jeff Chong, and Mohammad Masum. "Understanding Tradeoffs in Clinical Text Extraction: Prompting, Retrieval-Augmented Generation, and Supervised Learning on Electronic Health Records" Algorithms (2026). https://doi.org/10.3390/a19030215

Download

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

Understanding Tradeoffs in Clinical Text Extraction: Prompting, Retrieval-Augmented Generation, and Supervised Learning on Electronic Health Records

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Understanding Tradeoffs in Clinical Text Extraction: Prompting, Retrieval-Augmented Generation, and Supervised Learning on Electronic Health Records

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Share

Search

Browse All

Links