Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
William Andreopoulos
Second Advisor
Navrati Saxena
Third Advisor
Rahul Sanjay Morishetti
Keywords
Instructor Embedding, Vector Database, Document Ingestion, Large Language Models (LLMs), Proxy Re-Encryption, Data Deduplication.
Abstract
Exponential growth in cloud computing has brought enormous changes in data storage and processing, but also raised several questions on the security, privacy, and efficient storage of data. This report provides a dual-focused approach toward solving these challenges. First, we try to build an application securely and efficiently using data deduplication and Proxy Re-Encryption for optimization of storage and enabling secure data sharing. Deduplication ensures that redundant data is removed before encryption for maximum efficiency in storage, while PRE enables the safe sharing of encrypted data by re-encrypting the keys for specified recipients without the leakage of sensitive information. We further propose developing a local version of GPT. LocalGPT is a solution for privacy and being absolutely offline to interact with documentation in a privacy-preserving manner. Indeed, all data is local to the user’s device. By combining Large Language Models and Document Ingestion with local embedding generation using the Instructor Embedding model and storing them in a Chroma vector database, one can efficiently and securely create and query contextual documents without relying on an external server.
Recommended Citation
Myana, Pavan, "SECURED DATA STORAGE MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING AND LOCAL GPT INTEGRATION" (2025). Master's Projects. 1545.
https://scholarworks.sjsu.edu/etd_projects/1545