Master of Science (MS)
In the past few years, there has been a keen interest in mining frequent itemsets in large data repositories. Frequent itemsets correspond to the set of items that occur frequently in transactions in a database. Several novel algorithms have been developed recently to mine closed frequent itemsets - these itemsets are a subset of the frequent itemsets. These algorithms are of practical value: they can be applied to real-world applications to extract patterns of interest in data repositories. However, prior to using an algorithm in practice, it is necessary to know its performance as well implementation issues. In this project, we address such a need for the algorithm “Using Attribute Value Lattice to Find Frequent Itemsets” that was developed by Lin et. al. We clarify some aspects of the algorithm, develop an implementation of the algorithm, and present the results of a performance study. In our experiments we find that the running time of the algorithm for certain input datasets grows exponentially. To address this problem, we develop a novel procedure for binning the data. Our results show that with binned data, the running time of the algorithm grows linearly. This allows one to obtain trends for the dataset.
Raghu, Aarathi, "A Fast Algorithm For Data Mining" (2006). Master's Projects. Paper 120.