Publication Date
Spring 2020
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Chris Pollett
Second Advisor
Robert Chun
Third Advisor
Thomas Austin
Keywords
News Aggregation, Indexing, Search Engine
Abstract
Yioop is an open source search engine project hosted on the site of the same name.It offers several features outside of searching, with one such feature being a news feed. The current news feed system aggregates articles from a curated list of news sites determined by the owner. However in its current state, the feed list is limited in size, constrained by the hardware that the aggregator is run on. The goal of my project was to overcome this limit by improving the current storage method used. The solution was derived by making use of IndexArchiveBundles and IndexShards, both of which are abstract data structures designed to handle large indexes. An additional aspect needed to accomodate for news feed was the ability to traverse said data structures in decreasing order of recently added. New methods were added to the preexisting WordIterator to handle this need. The result is a system with two new advantages, the capacity to store more feed items than before and the functionality of moving through indexes from the end back to the start. Our findings also indicate that the new process is much faster, with insertions taking one-tenth of the time at its fastest. Additionally, whereas the old system only stored around 37500 items at most, the new system allows for potentially unlimited news items to be stored. The methodology detailed in this project can also be applied to any information retrieval system to construct an index and read from it.
Recommended Citation
Chow, Timothy, "Improved User News Feed Customization for an Open Source Search Engine" (2020). Master's Projects. 908.
DOI: https://doi.org/10.31979/etd.uw47-hp9s
https://scholarworks.sjsu.edu/etd_projects/908