Publication Date

Spring 2016

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

T. Y. Lin

Second Advisor

Jon Pearce

Third Advisor

Thomas Austin


Web Mining Clustering Homology


As data is being mined more and more from the Internet today, Data Science has become an important field of computing to make that data useful. Data Science allows people to turn all of that data into structured knowledge that is easily utilized, validated, and understandable. There are many known theories to analyze data, but this project will focus on a recently introduced method: analyzing text data with homology from mathematics to understand relationships between keyword-sets.

Using structures of algebraic topology as a starting point, keyword-sets in the text are represented by simplexes based on what they are and what their length is. These sets of simplexes come together to make up clustered simplicial complexes, all laying the groundwork for homology to come into play. By calculating homology on all of these simplicial complexes, we can then know the relations between keyword-sets better. Previous work on data analysis of text data through homology was based on establishing the relationships on the real space, but this project extends that to integer space so that the homology can reveal more detail about those relationships.