Publication Date

Spring 2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

Thomas Austin

Keywords

News Aggregation, Indexing, Search Engine

Abstract

Yioop is an open source search engine project hosted on the site of the same name.It offers several features outside of searching, with one such feature being a news feed. The current news feed system aggregates articles from a curated list of news sites determined by the owner. However in its current state, the feed list is limited in size, constrained by the hardware that the aggregator is run on. The goal of my project was to overcome this limit by improving the current storage method used. The solution was derived by making use of IndexArchiveBundles and IndexShards, both of which are abstract data structures designed to handle large indexes. An additional aspect needed to accomodate for news feed was the ability to traverse said data structures in decreasing order of recently added. New methods were added to the preexisting WordIterator to handle this need. The result is a system with two new advantages, the capacity to store more feed items than before and the functionality of moving through indexes from the end back to the start. Our findings also indicate that the new process is much faster, with insertions taking one-tenth of the time at its fastest. Additionally, whereas the old system only stored around 37500 items at most, the new system allows for potentially unlimited news items to be stored. The methodology detailed in this project can also be applied to any information retrieval system to construct an index and read from it.

Share

COinS