Publication Date

2006

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

Abstract

Internet search has become an essential part of almost everyone’s daily life and work. To make wise personal and business decisions in a timely fashion, one must access the most relevant information efficiently. Because the amount of information on the Internet is enormous, it is important that a search engine ranks the information appropriately when it presents search results to users. Latent Semantic Indexing (LSI) addresses relevance ranking based on how significant a search word is in each document. Some innovative approaches of computing higher dimensional LSI (HD-LSI) were explored in this project. In traditional LSI, the term frequency-inverse document frequency (TFIDF) is calculated based on how significant a single word is in a document. The goal of this project is to generalize LSI to higher dimensions regarding the traditional LSI as the one-dimensional special case. A benefit of the project is to enable a search engine to rank documents based on the special meaning of multi-word phrases, such as “wall street,” which is captured by a two-dimensional LSI method. Another benefit of the project is the reusable Java software components that compute HD-LSI and store the indexes into a relational database, from which many types of applications can access the HD-LSI data. The software components may be reused for studying the proximity of semantics among documents in high dimensional space in future research. Besides the software engineering aspect, this project contributes to computer science by studying the different approaches to HD-LSI computation. In particular, the dimensional trends in each case were analyzed.

Recommended Citation

Vo, Mong-Hang, "Automatic Extraction of Keywords and Co-occurrence Keyword Sets" (2006). Master's Projects. 25.
DOI: https://doi.org/10.31979/etd.5mej-jmzn
https://scholarworks.sjsu.edu/etd_projects/25

Download

Included in

Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.5mej-jmzn

Master's Projects

Automatic Extraction of Keywords and Co-occurrence Keyword Sets

Publication Date

Degree Type

Degree Name

Department

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Automatic Extraction of Keywords and Co-occurrence Keyword Sets

Author

Publication Date

Degree Type

Degree Name

Department

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links