Avoiding Split Brain Scenarios
Distributed environments face a variety of challenges when it comes to maintaining a consistent state. Network segmentation, also known as a “split-brain” problem, is one of them. Due to temporary network problems, your cluster may become segmented when nodes (or groups of nodes) become isolated from the rest of topology. Each such isolated group would act as if it is the only one left and would therefore proceed unaware of the other groups. If network segmentation is not addressed, it can cause inconsistent clusters and inconsistent data. The GridGain® network segmentation protection feature addresses these issues.
GridGain provides pluggable segmentation resolvers and policies that help minimize the risks associated with network segmentation. GridGain ensures that users will never have inconsistent or dirty writes or reads if segmentation occurred. Full data consistency is especially valuable in production systems performing critical ACID transactions.
If segmentation occurs, GridGain elects a primary segment which is allowed to proceed. Other segments are considered inconsistent and will be forced to rejoin the cluster whenever the segmentation is restored. Each GridGain node checks network segment individually, using pluggable segmentation resolvers that can be tailored to any environment. GridGain comes with several implementations out of the box. There are various configuration options available that can help you perform segment check whenever a node leaves the topology, before the start of discovery SPI, or periodically.
Network Segmentation Protection Features
Compound Segment Checks
Each segmentation resolver in GridGain checks the segment for validity. Typically, the resolver would run a lightweight single check (i.e., one IP address or one shared folder). GridGain can also perform compound segment checks using several resolvers. If the segmentation resolver determines that a local grid node belongs to an incorrect segment, the node will act in accordance to the user configured segmentation policy.
Custom Segmentation Resolver
The GridGain in-memory computing platform allows users to implement their own segmentation resolvers. GridGain also supports logical segmentation and is not limited to network-related segmentation only. For example, a particular segmentation resolver can check for a specific application or service present on the network and mark the topology as segmented in case it is not available. In other words, you can equate the service outage with a network outage via segmentation resolution and employ the unified approach in dealing with these types of problems.
Built-In Segmentation Resolver
GridGain includes several default segmentation resolver implementations. When configured, these implementations check whether a node is in the correct segment or not in two ways:
- By establishing a TCP connection to the configured host and port
- By writing to and reading from a file system or a shared directory such as NFS or Samba that acts as a shared resource for all the nodes in the cluster
Segmentation Events Notification
The GridGain in-memory computing platform has built-in event types to notify on segmentation events. GridGain can automatically send notifications when a node gets segmented or when a node gets reconnected. These automatic notifications keep users aware of their cluster state. As with any other events in the system, users can subscribe to these events locally and query the events from remote nodes in a distributed fashion.
Built-In Segmentation Policies
GridGain provides built-in segmentation policies that define what action needs to be taken in case segmentation happens. A user can choose to:
- Restart the node
- Stop the node
- NOOP - all event listeners are notified that segmentation has occurred, allowing users to handle this event with their own custom logic
When network segmentation does occur, these policies make it easier to manage and fix problems.