Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Chris Pollett
Second Advisor
Navrati Saxena
Third Advisor
Thomas Austin
Keywords
Analytics storage, Log-based system, Scalability, Write efficiency, Rollup aggregation.
Abstract
Modern search engines and wiki platforms generate vast quantities of user inter-
/="/">action data such as page views, edits, clicks, and session events. This data must be stored and aggregated efficiently to enable scalable analytics and responsive querying. Yioop, an open-source search engine framework, serves as our primary case study, processing millions of such events to power its indexing and recommendation features. This report explores a shift from the conventional database storage based model to a log based model, in order to improve scalability and write efficiency. A size-limited, append-only logging facility was provided to log analytics events: the active log file rotates when the defined size limit is reached, keeping up file sizes manageable. Existing aggregate routines were modified to roll up logged events into summary records for efficient querying. The paper discusses performance benchmarks comparing the two storage models on storage footprint, write latency, response time of the query, and overall throughput under differing loads. Results demonstrate that the log-based method reduces storage footprint significantly and scales well with increasing data, delivering higher write throughput in sequential ingestion tests and keeping pace with the database under concurrent load, while maintaining comparable read performance. These findings confirm the log-based approach as a good way forward for Yioop’s analytics infrastructure and identify opportunities for additional optimization of data compaction and indexing.
Recommended Citation
Kakarlapudi, Sujith, "Optimizing Analytics Storage Strategies for Search Engines and Wiki Platforms" (2025). Master's Projects. 1531.
DOI: https://doi.org/10.31979/etd.pc2m-wnm8
https://scholarworks.sjsu.edu/etd_projects/1531