Publication Date
Fall 2016
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
T. Y. Lin
Second Advisor
Teng Moh
Third Advisor
Kong Li
Keywords
knowledge mining simplical complexes
Abstract
Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only takes 5 seconds. The collection of such concepts can be interpreted geometrically into simplicial complex, which can be construed as the knowledge base of this set of documents. Furthermore, we use homology theory to analyze this knowledge base (deep data analysis). For example, in mining market basket data with {a, b, c, d}, we find out frequent item sets {abc, abd, acd, bcd}, and the homology group H2 = Z (the integer Abelian group), which implies that very few customers buy four items together {abcd}, then we may analysis possible causes, etc.
Recommended Citation
Liu, Xuanyu, "Deep Data Analysis on the Web" (2016). Master's Projects. 500.
DOI: https://doi.org/10.31979/etd.v66k-jrjz
https://scholarworks.sjsu.edu/etd_projects/500