Publication Date

6-1-2016

Document Type

Article

Publication Title

The VLDB Journal

Volume

Issue

DOI

10.1007/s00778-015-0419-9

First Page

339

Last Page

354

Abstract

Bit-vectors are widely used for indexing and summarizing data due to their efficient processing in modern computers. Sparse bit-vectors can be further compressed to reduce their space requirement. Special compression schemes based on run-length encoders have been designed to avoid explicit decompression and minimize the decoding overhead during query execution. Moreover, highly compressed bit-vectors can exhibit a faster query time than the non-compressed ones. However, for hard-to-compress bit-vectors, compression does not speed up queries and can add considerable overhead. In these cases, bit-vectors are often stored verbatim (non-compressed). On the other hand, queries are answered by executing a cascade of bit-wise operations involving indexed bit-vectors and intermediate results. Often, even when the original bit-vectors are hard to compress, the intermediate results become sparse. It could be feasible to improve query performance by compressing these bit-vectors as the query is executed. In this scenario, it would be necessary to operate verbatim and compressed bit-vectors together. In this paper, we propose a hybrid framework where compressed and verbatim bitmaps can coexist and design algorithms to execute queries under this hybrid model. Our query optimizer is able to decide at run time when to compress or decompress a bit-vector. Our heuristics show that the applications using higher-density bitmaps can benefit from using this hybrid model, improving both their query time and memory utilization.

Keywords

Bit-sliced index, Bit-vector index, Bitmap index, Query optimization, Top-k preference queries

Comments

This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s00778-015-0419-9

Department

Computer Engineering

Recommended Citation

Gheorghi Guzun and Guadalupe Canahuate. "Hybrid query optimization for hard-to-compress bit-vectors" The VLDB Journal (2016): 339-354. https://doi.org/10.1007/s00778-015-0419-9

Download

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

Hybrid query optimization for hard-to-compress bit-vectors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Keywords

Comments

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Hybrid query optimization for hard-to-compress bit-vectors

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Keywords

Comments

Department

Recommended Citation

Share

Search

Browse All

Links