Author

Gargi Sheguri

Publication Date

Fall 2023

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

Ben Reed

Keywords

Yioop, Search Engine, Search Engine Results Page (SERP), Indexing

Abstract

Indexing in search engines is the process of storing information related to crawled pages to facilitate searches. A crucial determinant of the success of a search engine is the efficiency of the indexing process utilized, which greatly affects both the speed and relevancy of search results. Yioop is an open-source web search engine that employs an inverted index strategy, wherein each term is mapped to a list of the documents it appeared in while crawling.

The primary aim of this project is to better the indexing system used by Yioop, and thus improve the quality of the Search Engine Results Page (SERP) generated for user queries. To achieve this, various methods aimed at bringing down the processing time and boosting Yioop’s page ranking mechanism have been employed. These modifications have been implemented in both the indexing process as well as in the lookup process. To bolster more relevant pages in the final results order, bonus factors for scoring certain types of documents higher are incorporated into the indexing process. The lookup system has been revised to fetch the most recently-crawled version of a document in an effort to improve freshness. Furthermore, Yioop now uses disjoint queries to maximize the number of results produced for a search phrase. In order to cut down on the response time, MaxScore calculation has been put into effect, which approximates an upper bound on the contribution a search term can have to the overall output ranking.

These enhancements have each been efficiently designed and evaluated to make sure that they further the quality of Yioop’s search functionality. This project report provides a comprehensive outline of the details and impact of these improvements.

Share

COinS