Code Reviews on a Budget: Memory-Efficient Fine-Tuning with QLoRA and RAG for Big Code Applications

Publication Date

1-1-2026

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science

Volume

16324 LNCS

DOI

10.1007/978-3-032-14107-1_31

First Page

367

Last Page

381

Abstract

In this technological era, where Artificial Intelligence and Machine Learning are revolutionizing various domains, Large Language Models (LLMs) are surfacing as a powerful tool for managing and analyzing large-scale data, including software codebases. In the process of software development, having reliable code reviews is highly essential to ensure code security, maintain quality, and manage huge code repositories. This paper aims to survey numerous existing methodologies that aid in creating efficient code review automation agents and investigate the suitability of various existing methods for fine-tuning open-source models in the context of code review automation. Parameter-efficient fine-tuning (PEFT) methodologies, such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA), have been explored, with an additional focus on a hybrid model that combines QLoRA with Retrieval Augmented Generation (RAG) to determine efficient ways to reduce the amount of memory required for fine-tuning while ensuring inference quality is not affected. The context has been used from general-purpose LLMs, specifically Meta’s Llama 3.2 3B model. Experiments show that the hybrid approach reduces memory utilization by nearly 17% while achieving low entropy values. The results also show improved performance over baseline systems in both efficiency and inference stability, highlighting the potential of this hybrid technique for real-world code review automation.

Keywords

Code Review Automation, Large Language Models, LoRA, QDyLoRA, QLoRA, Retrieval Augmented Generation

Department

Computer Science

Share

COinS