Publication Date

Fall 2011

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

Abstract

Association Rule Mining is a widely used method for finding interesting relationships from large data sets. The challenge here is how to swiftly and accurately discover association rules from large data sets. To achieve this, this paper will (1) build a data warehouse system that simulates the secondary storage and represents a database by bit patterns, and (2) implement a new geometric algorithm to find association rules, called Maximal Simplex Algorithm. The data warehouse consists of very long bit columns. Each column is an item or an attribute value pair and a row represents a transaction or a tuple in a database. A bit value 1 in a row represents the transaction contain this item or the tuple contains this value. In this Maximal Simplex Algorithm, we interpret the set of bit columns as a set of independent vertices in a high dimension Euclidean space. The main idea is for each vertex, we find its star neighborhood, namely to find all simplexes that contains this vertex. An n-dimensional simplex is called n-simplex. An n-simplex represents the association rule of length n+1. Based on the experimental results, Maximal Simplex method improves the performance of association rule mining. And also it is possible to achieve parallel computing by using the data warehouse system.

Share

COinS