Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Navrati Saxena

Third Advisor

Thomas Austin

Keywords

Analytics storage, Log-based system, Scalability, Write efficiency, Rollup aggregation.

Abstract

Modern search engines and wiki platforms generate vast quantities of user inter-
/="/">action data such as page views, edits, clicks, and session events. This data must be stored and aggregated efficiently to enable scalable analytics and responsive querying. Yioop, an open-source search engine framework, serves as our primary case study, processing millions of such events to power its indexing and recommendation features. This report explores a shift from the conventional database storage based model to a log based model, in order to improve scalability and write efficiency. A size-limited, append-only logging facility was provided to log analytics events: the active log file rotates when the defined size limit is reached, keeping up file sizes manageable. Existing aggregate routines were modified to roll up logged events into summary records for efficient querying. The paper discusses performance benchmarks comparing the two storage models on storage footprint, write latency, response time of the query, and overall throughput under differing loads. Results demonstrate that the log-based method reduces storage footprint significantly and scales well with increasing data, delivering higher write throughput in sequential ingestion tests and keeping pace with the database under concurrent load, while maintaining comparable read performance. These findings confirm the log-based approach as a good way forward for Yioop’s analytics infrastructure and identify opportunities for additional optimization of data compaction and indexing.

Available for download on Monday, May 25, 2026

Share

COinS