Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Robert Chun

Second Advisor

Thomas Austin

Third Advisor

Sankalp Dwivedi

Keywords

Data Engineering, ETL (Extract, Transform, Load), LLMs (Large Language Models), QuanDzaDon, QLora (QuanDzed Low-Rank Adapter), Supervised Fine-Tuning

Abstract

This projects discusses the use of Large language model to simplify and automate Extract, Transform, Loading (ETL) development. Data engineering tasks oUen need technical experDse, thereby challenging non-experts and even industry professionals for such Dme intensive operaDons. The project tackles these challenges by employing quanDzaDon of a base Lllama-2-7b-Chat model using QLora technique. The lowered precision of the base model allow further execuDon on a limited hardware resource. The project further performs Supervised fine-tuning trainer (SFT) to perform fine-tuning, specializing the model for generaDng script for ETL tasks. The fine-tuned model is evaluated through a series of micro and comprehensive end-toend ETL tasks as is compared with human-wriQen baseline scripts, demonstraDng robust performance with respect to efficiency, accuracy, and response Dme. The finetuned model achieved 85% accuracy in transformaDon tasks such as schema mapping, and data cleaning operaDons like forma`ng date, and deriving columns from raw unstructured csv files. The model is published on Hugging Face and with 81 downloads Dll date, it provides reasonable soluDon for ETL automaDon. The model also lays the foundaDon for further enhancements in the field.

Available for download on Monday, May 25, 2026

Share

COinS