As the world embraces the digital era, unprecedented volumes of information are generated and consumed daily. It becomes difficult to comb through mountains of documents to locate search topics. With inherent ambiguity in human languages, conventional methods using straight text pattern match cannot resolve words having multiple meanings and often misinterpret user intent. There is a need to develop a system able to identify the target topic and return quality relevant links, ending the tedium of rummaging through piles of unrelated links that may get lost in the rubble. An example search of the words “sound investment” helps to illustrate this point. Both Google and Bing return result sets that disorderly interleave musical services and financial planning links, two very different subject matters. User is left to cherry pick manually among the results for the intended links. To combat this problem, this project seeks to develop a new automated methodology for classifying web content by semantics, featuring machine learning capability that can adapt to a rapidly changing environment. This will enable a new type of search engine that organizes results according to related topics.
Do, Bieu Binh, "SEMANTIC DISCOVERY THROUGH TEXT PROCESSING" (2012). Master's Projects. 282.