MINING CONCEPT IN BIG DATA

Jingjing Yang

Abstract

Data Mining has become very important to business and IT industry. In order to translate vast amounts of data into understandable information many data mining techniques have been developed. Concepts of two well-known techniques, Apriori and FP-growth are used. This project explores a new data mining technique that is based on the Simplicial Complexes, which are combinatorial forms of polyhedral used in algebraic topology. Similar to FP-growth, this approach is top down. Similar to Apriori, the underlying principle is Apriori which is called closed condition in simplicial complex. Using a real world database provided by The National Taiwan University Hospital, and one SJSU laptop, this program investigated a database total of 65,536 transactions and 1257 columns in bit form. Our method returned results nearly 300 times faster than FP-growth. The core engine (of the concept based semantic search engine) is to mine concepts from big text data.