Publication Date

Spring 2012

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Search Engine queries often have duplicate words in the search string. For example user searching for "pizza pizza" a popular brand name for Canadian pizzeria chain. An efficient search engine must return the most relevant results for such queries. Search queries also have pair of words which always occur together in the same sequence, for example “honda accord”, “hopton wafers”, “hp newwave” etc. We will hereafter refer to such pair of words as bigrams. A bigram can be treated as a single word to increase the speed and relevance of results returned by a search engine that is based on inverted index. Terms in a user query have a different degree of importance based on whether they occur inside title, description or anchor text of the document. Therefore an optimal weighting scheme for these components is required for search engines to prioritize relevant documents near the top for user searches. The goal of my project is to improve Yioop, an open source search engine created by Dr Chris Pollett, to support search for duplicate terms and bigrams in a search query. I will also optimize the Yioop search engine by improving its document grouping and BM25F weighting scheme. This would allow Yioop to return more relevant results quickly and efficiently for users of the search engine.