Faculty Research, Scholarly, and Creative Activity

Agentic Hallucination Risk Scoring for Medical LLMs via Uncertainty Quantification and Clinical Knowledge Injection

Publication Date

4-17-2026

Document Type

Article

Publication Title

Algorithms

Volume

Issue

DOI

10.3390/a19040315

Abstract

Large Language Models (LLMs) have witnessed significant adoption across numerous domains since 2020, but their proclivity to hallucinate creates unacceptable dangers in high-risk environments like healthcare, where wrong outputs can directly jeopardize human safety. While present systems focus on pre-generation mitigation strategies, they cannot ensure the safety of individual outputs during inference. We provide a post hoc Hallucination Risk Scoring (HRS) methodology that intercepts questionable outputs before they reach patients via an agentic pipeline. Given a medical question, a domain-specific LLM generates an initial response from which five complimentary uncertainty signals are computed, which are then separated into a decision layer that governs escalation and a guidance layer that directs clinical knowledge injection by a GPT. The framework is tested using three biological question-answering datasets of various complexity: PubMedQA-Labeled, PubMedQA-Artificial, and BioASQ Task B. The results show an up to 38% safety increase at the most sensitive threshold configuration, zero deterioration across all experimental configurations enforced by the Revert Baseline method, and complexity-aware escalation rates that scale organically with dataset difficulty. Tunable thresholds allow physicians to calibrate system behavior based on deployment requirements, providing a practical safety–accuracy trade-off. Statistical research finds entropy as the primary uncertainty signal separating escalated from non-escalated situations across all datasets. These findings provide a deployable, interpretable, and configurable post hoc safety paradigm for reliable medical AI implementation.

Keywords

agentic systems, clinical knowledge injection, hallucination detection, large language models, medical question-answering, patient safety, post hoc safety, uncertainty quantification

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Applied Data Science

Recommended Citation

Mayank Kapadia and Mohammad Masum. "Agentic Hallucination Risk Scoring for Medical LLMs via Uncertainty Quantification and Clinical Knowledge Injection" Algorithms (2026). https://doi.org/10.3390/a19040315

Download

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

Agentic Hallucination Risk Scoring for Medical LLMs via Uncertainty Quantification and Clinical Knowledge Injection

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Agentic Hallucination Risk Scoring for Medical LLMs via Uncertainty Quantification and Clinical Knowledge Injection

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Share

Search

Browse All

Links