Author

Aarsh Patel

Publication Date

Fall 2023

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

William Andreopoulos

Keywords

Survival analysis, Time-Series Data, Benchmarking Suite, Time-Series Databases, NoSQL Databases

Abstract

Survival analysis data is crucial for predicting future events and making informed decisions. Storing this data in databases enables researchers and analysts to easily access and analyze it, facilitating more accurate predictions and better decision-making. There is a growing demand to store such data utilizing databases. While benchmarking tools are available to aid in selecting the appropriate database, there is currently no benchmarking suite designed explicitly for survival analysis data. In this report, I present the development and analysis of a benchmarking suite for survival analysis data. The suite encompasses performance metrics for both read and write operations and has been applied to several popular databases, including QuestDB, TimescaleDB, Cassandra, and MongoDB. Specialized topics related to survival analysis, such as Log-Rank, Cox Proportional Hazards, and Kaplan-Meier, were given significant attention. Using the suite, I compared NoSQL databases with time-series databases for storing and retrieving survival analysis data. The project's findings reveal differences as NoSQL databases don’t perform as well as time series databases. Although NoSQL databases are generally useful, certain survival analysis queries are unresponsive. TimescaleDB performs exceptionally well across various queries, indicating its suitability for time-dependent data scenarios. The comparative
analysis highlights the importance of selecting databases tailored to the specific data needs of survival analysis. It recognizes that specialized time-series databases have an advantage in this area.

Share

COinS