Publication Date

Fall 2015

Degree Type

Master's Project


Computer Science


Self-tuning histograms are a type of histograms very popular these days, as they allow the usage of multidimensional datasets. The main advantage of them is that they have a low computational cost due to their capacity to understand the dataset. Also, they proposed a better approach as they stay up-to-date and have adaptability to query patterns. According to the above, many researchers have worked on improving the accuracy of these type of histograms, which has led to the use of subspace clustering methods as initialization values. Following this approach in this study, a self-tuning histogram code was developed with the objective of comparing two different subclustering methods (Proclus and Mineclus) for the initialization values. The script was tested with two different datasets (2-D and 4-D). It was found that the Proclus algorithm performed better than the Mineclus. Also, it was proved that the size of the bucket was crucial to achieve more accuracy (Khachatryan, Clustering-initialized adaptive histograms and probabilistic cost estimation for query optimization, 2012).