Publication Date

Spring 6-1-2021

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Benjamin Reed

Second Advisor

Navrati Saxena

Third Advisor

Alexander Shraer

Keywords

Two-Server problem, Two-Site Problem, witnesses, Zookeeper Atomic Broadcast

Abstract

Many replicated data services utilize majority quorums to safely replicate data changes in the presence of server failures. Majority quorum-based services require a simple majority of the servers to be operational for the service to stay available. A key limitation of the majority quorum is that if a service is composed of just two servers, progress cannot be made even if a single server fails because the majority quorum size is also two. This is called the Two-Server problem. A problem similar to the Two-Server problem occurs when a service’s servers are spread across only two failure domains. Servers in a failure domain can fail together. When one of the two failure domains fails, the servers in the other failure domain may not be able to form a majority quorum, rendering the service unavailable. We call this the Two-Site Problem, where each site is one failure domain. We propose to solve the Two-Server problem by using witnesses, lightweight servers that only store metadata required to participate in a quorum. We show that the solution to the Two-Server problem is also applicable to the Two-Site problem. We tested this solution in the context of Zookeeper, a replicated coordination service. Zookeeper utilizes the Zookeeper Atomic Broadcast (Zab) protocol to replicate its coordination data. We designed and incorporated witnesses in Zab. We show that our solution has increased the liveness of Zookeeper in the two-server scenario. We also show that Zab’s safety properties are not affected by these changes.

Share

COinS