Publication Date
8-1-2016
Document Type
Article
Publication Title
Knowledge and Information Systems
Volume
48
Issue
2
DOI
10.1007/s10115-015-0877-9
First Page
277
Last Page
304
Abstract
Bitmap indices are a widely used scheme for large read-only repositories in data warehouses and scientific databases. This binary representation allows the use of bit-wise operations for fast query processing and is typically compressed using run-length encoding techniques. Most bitmap compression techniques are aligned using a fixed encoding length (32 or 64 bits) to avoid explicit decompression during query time. They have been proposed to extend or enhance word-aligned hybrid (WAH) compression. This paper presents a comparative study of four bitmap compression techniques: WAH, PLWAH, CONCISE, and EWAH. Experiments are targeted to identify the conditions under which each method should be applied and quantify the overhead incurred during query processing. Performance in terms of compression ratio and query time is evaluated over synthetic-generated bitmap indices, and results are validated over bitmap indices generated from real data sets. Different query optimizations are explored, query time estimation formulas are defined, and the conditions under which one method should be preferred over another are formalized.
Keywords
Bitmap indices, Data warehouses, Performance comparison and estimation, Word-aligned compression
Department
Computer Engineering
Recommended Citation
Gheorghi Guzun and Guadalupe Canahuate. "Performance evaluation of word-aligned compression methods for bitmap indices" Knowledge and Information Systems (2016): 277-304. https://doi.org/10.1007/s10115-015-0877-9
Comments
This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10115-015-0877-9