Master of Science in Computer Science (MSCS)
Survival analysis, Time-Series Data, Benchmarking Suite, Time-Series Databases, NoSQL Databases
Survival analysis data is crucial for predicting future events and making informed decisions. Storing this data in databases enables researchers and analysts to easily access and analyze it, facilitating more accurate predictions and better decision-making. There is a growing demand to store such data utilizing databases. While benchmarking tools are available to aid in selecting the appropriate database, there is currently no benchmarking suite designed explicitly for survival analysis data. In this report, I present the development and analysis of a benchmarking suite for survival analysis data. The suite encompasses performance metrics for both read and write operations and has been applied to several popular databases, including QuestDB, TimescaleDB, Cassandra, and MongoDB. Specialized topics related to survival analysis, such as Log-Rank, Cox Proportional Hazards, and Kaplan-Meier, were given significant attention. Using the suite, I compared NoSQL databases with time-series databases for storing and retrieving survival analysis data. The project's findings reveal differences as NoSQL databases don’t perform as well as time series databases. Although NoSQL databases are generally useful, certain survival analysis queries are unresponsive. TimescaleDB performs exceptionally well across various queries, indicating its suitability for time-dependent data scenarios. The comparative
analysis highlights the importance of selecting databases tailored to the specific data needs of survival analysis. It recognizes that specialized time-series databases have an advantage in this area.
Patel, Aarsh, "Database Benchmarking Suite for Survival Analysis Data" (2023). Master's Projects. 1319.