Faculty Research, Scholarly, and Creative Activity

Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage

Xincheng Yuan, San Jose State University
Melody Moh, San Jose State UniversityFollow
Teng Sheng Moh, San Jose State UniversityFollow

Publication Date

1-1-2022

Document Type

Conference Proceeding

Publication Title

Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022

DOI

10.1109/ASONAM55673.2022.10068661

First Page

269

Last Page

276

Abstract

Deduplication is the process of removing replicated data content from storage facilities like online databases, cloud datastore, local file systems, etc. It is commonly performed as part of data preprocessing to eliminate redundant data that requires extra storage spaces and computing power and is crucial for data storage management in cloud computing. Deduplication is essential for file backup systems since duplicated files will presumably consume more storage space, especially with a short backup period such as daily. A common technique in this field involves splitting files into chunks whose hashes can be compared using data structures or techniques like clustering. This paper explores the possibility of performing such file chunk deduplication leveraging an innovative reinforcement learning approach to achieve a high deduplication ratio. The proposed system is named SegDup, which achieves 13% higher deduplication ratio than Extreme Binning, a state-of-the art deduplication algorithm.

Keywords

Bloom Filter, Cloud Storage, Deduplication, Deep Q-Network, Reinforcement Learning

Department

Computer Science

Recommended Citation

Xincheng Yuan, Melody Moh, and Teng Sheng Moh. "Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage" Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022 (2022): 269-276. https://doi.org/10.1109/ASONAM55673.2022.10068661

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage

Publication Date

Document Type

Publication Title

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage

Authors

Publication Date

Document Type

Publication Title

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links