RAFT Consensus
GridGain uses the RAFT consensus algorithm to maintain data consistency and fault tolerance across the cluster. This topic explains what RAFT is, how it works, and how GridGain uses it.
What is RAFT?
RAFT is a consensus algorithm that enables multiple servers in a distributed system to agree on a shared state, even when some servers fail or network partitions occur.
The algorithm ensures that all nodes agree on the cluster status and data update history. This guarantees that all nodes converge on the same state in the same order, providing strong consistency across the cluster.
RAFT achieves consensus through leader election, log replication, and safety mechanisms that work together to maintain consistency.
Node States and Leadership
Every node in a GridGain RAFT group exists in one of four states:
- Leader
-
The node responsible for handling all client requests and managing log replication to followers. There can be no more than one leader per RAFT group.
- Follower
-
Nodes that replicate data from the leader and participate in voting during elections. Followers passively receive updates from the leader.
- Candidate
-
A transitional state when a follower believes the leader has failed and initiates an election to become the new leader, and proposes itself as a leader on the election start.
- Learner
-
Nodes that only replicate data from the leader, but do not participate in voting.
This single-leader approach simplifies the replication protocol and eliminates conflicts during normal operation. The leader is typically the same node that holds the primary replica lease for the partition.
Quorum Requirements and Fault Tolerance
RAFT requires a majority quorum for both elections and commitment. In a group of N nodes, at least half + 1 nodes must be available.
| Total Nodes | Quorum Required | Can Tolerate |
|---|---|---|
1 |
1 |
0 failures |
2 |
2 |
0 failures |
3 |
2 |
1 failure |
5 |
3 |
2 failures |
7 |
4 |
3 failures |
This quorum requirement ensures that no two disjointed subgroups of a group can have the majority, preventing inconsistencies. If the cluster is partitioned, only the part containing the majority can elect a leader. The minority partition remains read-only or unavailable until network connectivity is restored.
How GridGain Uses RAFT
GridGain relies on RAFT as the foundational mechanism for maintaining consistency and managing cluster operations.
Cluster Metadata
The metastorage group handles cluster metadata, such as table and index definitions, distribution zone configurations, and cluster logical topology.
In a similar way, the Cluster Management Group manages current cluster state, and coordinates node join and leave operations.
Data Replication
When a table is created, it is immediately partitioned according to distribution zone configuration, and each partition is assigned to a RAFT group. These RAFT groups are shared by all tables in the same distribution zone and are used to perform data updates and replicate data across the cluster.
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.