Publication Date
Spring 2016
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
T. Y. Lin
Second Advisor
Jon Pearce
Third Advisor
Thomas Austin
Keywords
Web Mining Clustering Homology
Abstract
As data is being mined more and more from the Internet today, Data Science has become an important field of computing to make that data useful. Data Science allows people to turn all of that data into structured knowledge that is easily utilized, validated, and understandable. There are many known theories to analyze data, but this project will focus on a recently introduced method: analyzing text data with homology from mathematics to understand relationships between keyword-sets.
Using structures of algebraic topology as a starting point, keyword-sets in the text are represented by simplexes based on what they are and what their length is. These sets of simplexes come together to make up clustered simplicial complexes, all laying the groundwork for homology to come into play. By calculating homology on all of these simplicial complexes, we can then know the relations between keyword-sets better. Previous work on data analysis of text data through homology was based on establishing the relationships on the real space, but this project extends that to integer space so that the homology can reveal more detail about those relationships.
Recommended Citation
Nam, Eric, "Analyzing Clustered Web Concepts with Homology" (2016). Master's Projects. 496.
DOI: https://doi.org/10.31979/etd.bg5q-z2x8
https://scholarworks.sjsu.edu/etd_projects/496