Master of Science in Computer Science (MSCS)
Yioop, Search Engine, Search Engine Results Page (SERP), Indexing
Indexing in search engines is the process of storing information related to crawled pages to facilitate searches. A crucial determinant of the success of a search engine is the efficiency of the indexing process utilized, which greatly affects both the speed and relevancy of search results. Yioop is an open-source web search engine that employs an inverted index strategy, wherein each term is mapped to a list of the documents it appeared in while crawling.
The primary aim of this project is to better the indexing system used by Yioop, and thus improve the quality of the Search Engine Results Page (SERP) generated for user queries. To achieve this, various methods aimed at bringing down the processing time and boosting Yioop’s page ranking mechanism have been employed. These modifications have been implemented in both the indexing process as well as in the lookup process. To bolster more relevant pages in the final results order, bonus factors for scoring certain types of documents higher are incorporated into the indexing process. The lookup system has been revised to fetch the most recently-crawled version of a document in an effort to improve freshness. Furthermore, Yioop now uses disjoint queries to maximize the number of results produced for a search phrase. In order to cut down on the response time, MaxScore calculation has been put into effect, which approximates an upper bound on the contribution a search term can have to the overall output ranking.
These enhancements have each been efficiently designed and evaluated to make sure that they further the quality of Yioop’s search functionality. This project report provides a comprehensive outline of the details and impact of these improvements.
Sheguri, Gargi, "Enhancing the Queueing Process for Yioop's Scheduler" (2023). Master's Projects. 1318.