Publication Date

4-17-2026

Document Type

Article

Publication Title

Algorithms

Volume

19

Issue

4

DOI

10.3390/a19040315

Abstract

Large Language Models (LLMs) have witnessed significant adoption across numerous domains since 2020, but their proclivity to hallucinate creates unacceptable dangers in high-risk environments like healthcare, where wrong outputs can directly jeopardize human safety. While present systems focus on pre-generation mitigation strategies, they cannot ensure the safety of individual outputs during inference. We provide a post hoc Hallucination Risk Scoring (HRS) methodology that intercepts questionable outputs before they reach patients via an agentic pipeline. Given a medical question, a domain-specific LLM generates an initial response from which five complimentary uncertainty signals are computed, which are then separated into a decision layer that governs escalation and a guidance layer that directs clinical knowledge injection by a GPT. The framework is tested using three biological question-answering datasets of various complexity: PubMedQA-Labeled, PubMedQA-Artificial, and BioASQ Task B. The results show an up to 38% safety increase at the most sensitive threshold configuration, zero deterioration across all experimental configurations enforced by the Revert Baseline method, and complexity-aware escalation rates that scale organically with dataset difficulty. Tunable thresholds allow physicians to calibrate system behavior based on deployment requirements, providing a practical safety–accuracy trade-off. Statistical research finds entropy as the primary uncertainty signal separating escalated from non-escalated situations across all datasets. These findings provide a deployable, interpretable, and configurable post hoc safety paradigm for reliable medical AI implementation.

Keywords

agentic systems, clinical knowledge injection, hallucination detection, large language models, medical question-answering, patient safety, post hoc safety, uncertainty quantification

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Applied Data Science

Share

COinS