Reason, Review, Repeat: Hybrid Chain of Thought to Mitigate Hallucinations in Large Language Models
Publication Date
1-1-2026
Document Type
Conference Proceeding
Publication Title
Proceedings of the 2026 20th International Conference on Ubiquitous Information Management and Communication Imcom 2026
DOI
10.1109/IMCOM69009.2026.11360964
Abstract
Hallucinations, plausible yet incorrect, are prevalent across large language models undermining confidence in their reliability. This study investigates mitigation approaches for hallucinations in large language models. The study examines its effectiveness by using code generation tasks as a benchmark. Using 141 coding problems, the study compares zero-shot inference, Chain-of-Thought, and a hybrid Chain-of-Thought approach that incorporates review, optimization, and testing phases. Four large language models that were evaluated through the different approaches were Llama 3.3 and Gemma 2 (general-purpose models), DeepSeek R1 (internal Chain-of-Thought), and Qwen 2.5 Coder (fine-tuned model). Evaluations take into account accuracy, token utilization, and generation time. The results demonstrate that reasoning-enhanced approaches consistently improve accuracy around 3 % to 15 %, with the hybrid Chain-of-Thought methodology showing the most significant gains. The findings suggest that targeted prompting strategies encouraging reasoning, review, and testing can significantly enhance the reliability of LLM-generated code, with consistent improvements observable across different models.
Keywords
Benchmark, Chain-of-Thought, Large Language Model, Natural Language Processing
Department
Computer Science
Recommended Citation
Jerry Liu, Melody Moh, and Teng Sheng Moh. "Reason, Review, Repeat: Hybrid Chain of Thought to Mitigate Hallucinations in Large Language Models" Proceedings of the 2026 20th International Conference on Ubiquitous Information Management and Communication Imcom 2026 (2026). https://doi.org/10.1109/IMCOM69009.2026.11360964