GridGain Developers Hub

Split-Brain Protection

When it comes to maintaining a consistent database state, distributed environments face a variety of challenges. Network segmentation, also known as a “split-brain” problem, is one of the challenges. Due to temporary network problems, your Apache Ignite or GridGain cluster may become segmented when nodes (or groups of nodes) are isolated from the rest of the topology.

Each isolated group acts as if it were the only group. Therefore, each group is unaware of the other groups. Network segmentation, if not addressed, can cause inconsistent clusters and inconsistent data.

GridGain provides segmentation resolvers and policies that reduce the risks that are associated with network segmentation.

Topology Validator

The topology validator verifies that cluster topology is valid for cache operations. The topology validator is invoked every time a node joins or leaves the cluster.

You can use the topology validator method to check for any combination of node parameters in your cluster. For example, you can check if your cluster has a specific number of nodes and stop all transactions if some nodes stop. This way, you can prevent the damage that could be caused by split-brain.

Validation Process

You can use the validate(Collection <ClusterNode>) method to check the status of cluster topology.

  • If the method returns true, the topology is considered valid, and operations are allowed to proceed.

  • If the method returns false, update operations are restricted, and the following exceptions occur:

    • CacheException for all update operations (put, remove, and so on) that were attempted

    • IgniteException for all transaction commit attempts

If the topology validator is not configured, any cluster topology is considered valid. If an invalid topology becomes valid (such as if a node rejoins the cluster or the metrics normalize), operations are allowed to proceed.

Configuration Example

The following example shows how you can use the topology validator to allow cache updates for two-node clusters:

 new TopologyValidator() {
    public boolean validate(Collection nodes) {
        return nodes.size() == 2;
    }
 }

Segmentation Resolver

GridGain provides the SegmentationResolver interface for handling split-brain issues.

The ValidSegment method checks nodes in a segment for validity. The ValidSegment method is recommended for lightweight checks (for example, one IP address or one shared folder). For compound segment checks, use multiple resolvers.

The nodes of valid segments operate normally. The nodes of invalid segments receive the EVT_NODE_SEGMENTED event and react according to preset segmentation policies.

Valid segments work as if the nodes in the invalid segment left the topology.

GridGain supports logical segmentation and is not limited to network-related segmentation. For example, a segmentation resolver can check for the presence of a specific application or service on the network and mark the topology as segmented if the service or application is not available.

Segmentation Policies

You can configure the way GridGain nodes react to topology segmentation. Nodes listen to EVT_NODE_SEGMENTED events to discover network segmentations and react in one of three ways:

  • NOOP. The node does nothing. The user must implement code to handle network segmentation.

  • RESTART_JVM. The node restarts and tries to reconnect to the cluster;

  • STOP. The node uses the Ignition.stop method to stop in a normal manner.

You can use the IgniteConfiguration.setSegmentationPolicy method to set segmentation policy. The default segmentation policy is defined by the IgniteConfiguration.DFLT_SEG_PLC property.

Recovering from Split-Brain

When split-brain issues happen, you can use a premade database snapshot to recover your data. For example, you can continuously store data to enable Point in Time Recovery.