Distributed Key-Value Store Challenge#
Welcome to the distributed key-value store challenge!
You’ll build a distributed key-value store from scratch using Raft consensus, the same algorithm that powers etcd and Consul. Start with a single-node system that handles persistence and crash recovery, then implement leader election, log replication, and fault tolerance.
This is the first project in lsfr’s distributed systems series. It teaches consensus-based replication; later projects will teach different patterns like leaderless replication, CRDTs, and Byzantine fault tolerance.
Stages#
Build a basic in-memory key-value store with GET/PUT/DELETE operations over HTTP.
Add persistence to your store. Data should survive clean shutdowns (SIGTERM).
Ensure data consistency after crashes. Data should survive unclean shutdowns (SIGKILL).
Form a cluster and elect a leader using the Raft consensus algorithm.
Replicate operations from the leader to followers with strong consistency guarantees.
Dynamically add and remove nodes from the cluster without downtime.
Handle node failures and network partitions while maintaining safety guarantees.
Prevent unbounded log growth through snapshots and log truncation.
Getting Started#
If you haven’t already, read this overview on how lsfr works and then start with stage 1 (HTTP API).
Resources#
Books#
- Designing Data-Intensive Applications by Martin Kleppmann
- Database Internals by Alex Petrov
Papers#
- The Raft Paper by Diego Ongaro & John Ousterhout
Videos#
- Distributed Systems lecture series by Martin Kleppmann
Implementations#
- little-key-value in Go by @st3v3nmw