Supporting dynamic quantization for high-dimensional data analytics

Publication Date

5-14-2017

Document Type

Conference Proceeding

Publication Title

Proceedings of the ExploreDB 2017

DOI

10.1145/3077331.3077336

Abstract

Similarity searches are at the heart of exploratory data analysis tasks. Distance metrics are typically used to characterize the similarity between data objects represented as feature vectors. However, when the dimensionality of the data increases and the number of features is large, traditional distance metrics fail to distinguish between the closest and furthest data points. Localized distance functions have been proposed as an alternative to traditional distance metrics. These functions only consider dimensions close to query to compute the distance/similarity. Furthermore, in order to enable interactive explorations of highdimensional data, indexing support for ad-hoc queries is needed. In this work we set up to investigate whether bit-sliced indices can be used for exploratory analytics such as similarity searches and data clustering for high-dimensional big-data. We also propose a novel dynamic quantization called Query dependent Equi-Depth (QED) quantization and show its effectiveness on characterizing high-dimensional similarity. When applying QED we observe improvements in kNN classification accuracy over traditional distance functions.

Funding Number

DMS-1557578

Funding Sponsor

National Science Foundation

Department

Computer Engineering

Share

COinS