A somewhat gentle introduction to Kademlia
July 29, 2021
Edited August 04, 2021
We describe a peer-to-peer system which has provable consistency and performance in a fault-prone environment. Our system routes queries and locates nodes using a novel XOR-based metric topology that simplifies the algorithm and facilitates our proof. The topology has the property that every message exchanged conveys or reinforces useful contact information. The system exploits this information to send parallel, asynchronous query messages that tolerate node failures without imposing timeout delays on users.
Even the abstract from the Kademlia paper is a lot to digest and I think we can say there's not going to be any hand-holding. Accordingly, this blog will serve as a collection of my thoughts.
An overview of the hash table
To begin, we'll go over a few definitions. Firstly, a hash function is a map such that
for . In words, reduces a universe with -many keys to -many buckets.
Next, a collision occurs whenever for two keys . A well-behaved hash function avoids collisions. Specifically, the authors of Kademlia propose SHA-1 with a keyspace ( in our model) of 160 bits.
Abstractly, a hashmap supports and . The former takes a key and stores value in bucket and the latter returns .
The upshot of this datastructure is fantastic lookup times (constant on average), but sometimes we need to store large values. Accordingly, we can store parts of the map on separate computers. But who coordinates all the computers? A centralized computer, much like the director of a play, decides which keys of the hashmap go to which computer. But say the director decides to quit the play, we've lost access to the values in our hashmap and this map is not particularly resilient.
What is a distributed hash table anyways?
Enter the distributed hash table. Much like the traditional hashmap, DHT also supports and . But it is fault-tolerant because if a computer leaves the network, we can still access the map.
Now, it would be wonderful if we had a protocol that partitions our map to many computers without a director. This is precisely what Kademlia proposes.
In my next blog posts, I'll go over the how and why of Kademlia.