Publication Date
6-1-2016
Document Type
Article
Publication Title
The VLDB Journal
Volume
25
Issue
3
DOI
10.1007/s00778-015-0419-9
First Page
339
Last Page
354
Abstract
Bit-vectors are widely used for indexing and summarizing data due to their efficient processing in modern computers. Sparse bit-vectors can be further compressed to reduce their space requirement. Special compression schemes based on run-length encoders have been designed to avoid explicit decompression and minimize the decoding overhead during query execution. Moreover, highly compressed bit-vectors can exhibit a faster query time than the non-compressed ones. However, for hard-to-compress bit-vectors, compression does not speed up queries and can add considerable overhead. In these cases, bit-vectors are often stored verbatim (non-compressed). On the other hand, queries are answered by executing a cascade of bit-wise operations involving indexed bit-vectors and intermediate results. Often, even when the original bit-vectors are hard to compress, the intermediate results become sparse. It could be feasible to improve query performance by compressing these bit-vectors as the query is executed. In this scenario, it would be necessary to operate verbatim and compressed bit-vectors together. In this paper, we propose a hybrid framework where compressed and verbatim bitmaps can coexist and design algorithms to execute queries under this hybrid model. Our query optimizer is able to decide at run time when to compress or decompress a bit-vector. Our heuristics show that the applications using higher-density bitmaps can benefit from using this hybrid model, improving both their query time and memory utilization.
Keywords
Bit-sliced index, Bit-vector index, Bitmap index, Query optimization, Top-k preference queries
Department
Computer Engineering
Recommended Citation
Gheorghi Guzun and Guadalupe Canahuate. "Hybrid query optimization for hard-to-compress bit-vectors" The VLDB Journal (2016): 339-354. https://doi.org/10.1007/s00778-015-0419-9
Comments
This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00778-015-0419-9