Amazon Dynamo — a conceptual overview (Part 3)
In Part 1 and Part 2 of the series, I talked about all the cool concepts used in Amazon Dynamo. The key aspect of the system is the decentralised nature of its architecture. This enables it to be incrementally scalable. This architecture also brings in challenges in terms of choosing the nodes where the replicas will lie, how would cluster wide changes be propagated to all nodes, etc.
We will first look at how the data placement is done using Consistent Hashing and how this scheme drives the logical topology of the cluster. We will also see how cluster wide knowledge is made available to all nodes in the cluster in the absence of a centralised brain.
Consistent Hashing
Any get() or put() request for a key needs to be routed to the node which hold the partition which contains the key. But what is a partition exactly? A partition is a set of keys. As the dataset outgrows the capacity of a single machine, it is partitioned and spread across a number of machines. In this scheme, it is evident that for every request for a key, we need to be able to determine the partition and in-turn the node which holds the partition.
I will shamelessly plug in another post of mine describing the concept of consistent hashing. As it is shown in this post, consistent hashing requires the nodes to be laid out logically as a ring. In fact, it won’t be outrageous to think that the ring topology in Amazon Dynamo is more inspired by this than anything else.
Preference List
In the consistent hashing algorithm, once the point on the ring is determined by hashing the key, the next node in the clockwise direction in the ring is the node which holds the desired partition.
In a system where there is one master replica and a few followers for a partition, usually the master replica will be the one that is determined from the consistent hashing algorithm. The followers are chosen such that those nodes lie in a different physical machine, rack or even data centre.
In a master-less system like Amazon Dynamo, all the N replicas are determined using the Consistent Hashing algorithm. The next N nodes that lie in the clockwise direction in the ring will hold the replicas for the key range.
This list of N nodes found using the Consistent Hashing scheme are called the preference list.
Gossip Protocol
In the absence of a centralised brain, how are cluster wide knowledge, like cluster membership changes (node in or node out), etc notified to all nodes ?
Amazon Dynamo uses a protocol called gossip protocol to make information available across all nodes in the cluster. A node randomly chooses another peer node and propagates the information about the changes and this goes on until all the nodes are visited.