Publication Date

8-1-2016

Document Type

Article

Department

Computer Engineering

Publication Title

Knowledge and Information Systems

Volume

48

Issue

2

DOI

10.1007/s10115-015-0877-9

First Page

277

Last Page

304

Abstract

Bitmap indices are a widely used scheme for large read-only repositories in data warehouses and scientific databases. This binary representation allows the use of bit-wise operations for fast query processing and is typically compressed using run-length encoding techniques. Most bitmap compression techniques are aligned using a fixed encoding length (32 or 64 bits) to avoid explicit decompression during query time. They have been proposed to extend or enhance word-aligned hybrid (WAH) compression. This paper presents a comparative study of four bitmap compression techniques: WAH, PLWAH, CONCISE, and EWAH. Experiments are targeted to identify the conditions under which each method should be applied and quantify the overhead incurred during query processing. Performance in terms of compression ratio and query time is evaluated over synthetic-generated bitmap indices, and results are validated over bitmap indices generated from real data sets. Different query optimizations are explored, query time estimation formulas are defined, and the conditions under which one method should be preferred over another are formalized.

Keywords

Bitmap indices, Data warehouses, Performance comparison and estimation, Word-aligned compression

Comments

This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10115-015-0877-9

Share

COinS