What Is the Raft Protocol?
The Raft protocol (Raft consensus algorithm) is a consensus algorithm used to keep a replicated log consistent across a distributed cluster. It enables a group of machines to agree on the same sequence of operations, even when some nodes fail or the network is unreliable. (Raft)
Raft was introduced as an alternative to Paxos with an emphasis on understandability and clean decomposition into key parts, without giving up safety. (Raft)
Why Raft Matters
Distributed systems often need strong consistency: every node must apply the same changes in the same order. Raft provides a proven approach for doing that using:
- Leader election
- Log replication
- Safety rules that prevent divergent state machines (Raft)
How the Raft Protocol Works
Raft uses a single leader model:
- One leader accepts client requests and appends them as log entries.
- Followers replicate the leader’s log.
- Once an entry is committed, nodes apply it to their state machine in order. (Raft)
Raft is typically described as three subproblems:
- Leader election
- Log replication
- Safety (Raft)
Core Concepts You Should Know
Replicated Log
A sequence of log entries that represent state changes. Raft ensures the cluster agrees on this sequence. (Raft)
Terms
Time is split into terms. Each term begins with an election, and term numbers act like a logical clock to detect stale information. (Wikipedia)
Commit Index
A log entry is considered committed once it has been safely replicated to a quorum (majority). Committed entries are then applied to the state machine in order. (Raft)
Quorum and Fault Tolerance
Raft requires a majority of nodes to make progress (for elections and commits). In a group of N nodes, it can tolerate f = ⌊(N − 1) / 2⌋ failures. (GridGain Systems)
The Two Key Raft RPCs
Most practical explanations of Raft center around two Remote Procedure Calls (RPCs): (Raft)
- RequestVote: used during leader elections.
- AppendEntries: used for log replication and also functions as the leader heartbeat.
This matters for implementers because it clarifies how Raft stays stable in normal operation and how it recovers during failures.
Leader Election in Raft
Nodes are typically in one of three roles:
- Follower
- Candidate
- Leader (Wikipedia)
Election flow:
- Followers expect regular heartbeats.
- If a follower times out, it becomes a candidate and starts an election.
- The candidate requests votes.
- A candidate that wins a majority becomes leader and starts sending heartbeats.
Election timeouts are usually hundreds of milliseconds to a few seconds, depending on configuration and network conditions. (Wikipedia)
Raft Safety Properties
Raft’s safety rules ensure the cluster does not “fork” into conflicting histories. A key guarantee is:
- If a state machine has applied a log entry at a given index, no other node can apply a different command for that same index. (Wikipedia)
Raft also relies on log properties (often summarized as “log matching” and “leader completeness”) to ensure leaders cannot introduce conflicting histories. (Wikipedia)
Log Compaction and Snapshotting
Without compaction, the replicated log grows forever. Raft supports log compaction via snapshots:
- Nodes periodically snapshot committed state.
- Snapshots let nodes discard older log entries.
- Leaders can send snapshots to lagging followers so they can catch up efficiently. (Wikipedia)
Cluster Membership Changes
Changing cluster membership is tricky because you must avoid split-brain behavior during transitions. Raft addresses this with joint consensus, a transitional configuration where the old and new configurations overlap during the change process. (Raft)
Common Use Cases for Raft
- Distributed metadata management (consistent cluster state, schemas, topology)
- Service discovery (highly available registries)
- Distributed configuration (atomic updates across the cluster)
- Distributed coordination (leader election and consistent control-plane state)
- Database coordination (consistent partition leadership and replicated logs)
How GridGain Uses Raft
In GridGain 9, Raft is a core mechanism for maintaining consistency and managing key cluster operations. (GridGain Systems)
Cluster management and metadata
GridGain documents two essential system Raft groups:
- Cluster Management Group (CMG)
- Metastorage Group (MG) (GridGain Systems)
GridGain’s Raft documentation describes these groups as foundational for coordinating cluster state and metadata. (GridGain Systems)
Partition-level Raft logs
GridGain also stores partition-specific Raft logs, which record elections and consensus activity for partitions. (GridGain Systems)
Note: Keep claims specific here. GridGain’s docs explicitly describe Raft’s role in cluster consistency, system groups, and Raft logs. (GridGain Systems)
Raft vs Paxos
Both Raft and Paxos solve the consensus problem under crash faults (not Byzantine faults). Raft is widely adopted because it was designed to be easier to understand and implement, using a strong leader model and a clearer decomposition of responsibilities. (Raft)
| Feature | Raft | Paxos |
|---|---|---|
| Design goal | Understandable consensus algorithm | Powerful but notoriously hard to reason about |
| Replication model | Strong leader-based replication | More flexible proposer model |
| Structure | Leader election, replication, safety | More monolithic protocol family |
Frequently Asked Questions
Only the partition that contains a majority (quorum) can elect a leader and commit new log entries. The minority side cannot safely make progress, preventing conflicting histories. (GridGain Systems)
No. Standard Raft assumes non-Byzantine failures (crashes, partitions). Handling malicious nodes requires different protocols. (Wikipedia)
Raft uses randomized election timeouts, reducing the chance that multiple candidates start elections simultaneously and deadlock voting. (Wikipedia)
Yes. To commit an entry, the leader typically needs confirmation from a majority, which adds network round-trips. In exchange, you get strong consistency and fault tolerance. (Raft)
Log compaction uses snapshots to prevent unbounded log growth and to help lagging nodes catch up efficiently. (Raft)