Publication Date

Spring 5-14-2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)

First Advisor

Suneuy Kim

Second Advisor

Robert Chun

Third Advisor

Thomas Austin

Abstract

Relational databases like Oracle, MySQL, and Microsoft SQL Server offer trans- action processing as an integral part of their design. These databases have been a primary choice among developers for business-critical workloads that need the highest form of consistency. On the other hand, the distributed nature of NoSQL databases makes them suitable for scenarios needing scalability, faster data access, and flexible schema design. Recent developments in the NoSQL database community show that NoSQL databases have started to incorporate transactions in their drivers to let users work on business-critical scenarios without compromising the power of distributed NoSQL features [1].

MongoDB is a leading document store that has supported single document atomicity since its first version. Sharding is the key technique to support the horizontal scalability in MongoDB. The latest version MongoDB 4.2 enables multi-document transactions to run on sharded clusters, seeking both scalability and ACID multi- documents. Transaction processing is a novel feature in MongoDB, and benchmarking the performance of MongoDB multi-document transactions in sharded clusters can encourage developers to use ACID transactions for business-critical workloads.

We have adapted pytpcc framework to conduct a series of benchmarking experi- ments aiming at finding the impact of tunable consistency, database size, and design choices on the multi-document transaction in MongoDB sharded clusters. We have used TPC’s OLTP workload under a variety of experimental settings to measure business throughput. To the best of our understanding, this is the first attempt towards benchmarking MongoDB multi-document transactions in a sharded cluster.

Share

COinS