Author

Pavan Myana

Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

William Andreopoulos

Second Advisor

Navrati Saxena

Third Advisor

Rahul Sanjay Morishetti

Keywords

Instructor Embedding, Vector Database, Document Ingestion, Large Language Models (LLMs), Proxy Re-Encryption, Data Deduplication.

Abstract

Exponential growth in cloud computing has brought enormous changes in data storage and processing, but also raised several questions on the security, privacy, and efficient storage of data. This report provides a dual-focused approach toward solving these challenges. First, we try to build an application securely and efficiently using data deduplication and Proxy Re-Encryption for optimization of storage and enabling secure data sharing. Deduplication ensures that redundant data is removed before encryption for maximum efficiency in storage, while PRE enables the safe sharing of encrypted data by re-encrypting the keys for specified recipients without the leakage of sensitive information. We further propose developing a local version of GPT. LocalGPT is a solution for privacy and being absolutely offline to interact with documentation in a privacy-preserving manner. Indeed, all data is local to the user’s device. By combining Large Language Models and Document Ingestion with local embedding generation using the Instructor Embedding model and storing them in a Chroma vector database, one can efficiently and securely create and query contextual documents without relying on an external server.

Available for download on Monday, May 25, 2026

Share

COinS