With large amounts of information being exchanged on the Internet, search engines have become the most popular tools for helping users to search and filter this information. However, keyword-based search engines sometimes obtain information, which does not meet user’ needs. Some of them are even irrelevant to what the user queries. When the users get query results, they have to read and organize them by themselves. It is not easy for users to handle information when a search engine returns several million results. This project uses a granular computing approach to find knowledge structures of a search engine. The project focuses on knowledge engineering components of a search engine. Based on the earlier work of Dr. Lin and his former student , it represents concepts in the Web by simplicial complexes. We found that to represent simplicial complexes adequately, we only need the maximal simplexes. Therefore, this project focuses on building maximal simplexes. Since it is too costly to analyze all Web pages or documents, the project uses the sampling method to get sampling documents. The project constructs simplexes of documents and uses the simplexes to find maximal simplexes. These maximal simplexes are regarded as primitive concepts that can represent Web pages or documents. The maximal simplexes can be used to build an index of a search engine in the future.
Lin, Yun-Chieh, "Knowledge Engineering in Search Engines" (2012). Master's Projects. 213.