The GridGain Enterprise Edition, built on Apache Ignite, includes a Data Center Replication feature that allows data transfer between caches in distinct topologies, possibly located in different geographic locations. Whenever a cache update occurs in one GridGain Enterprise Edition topology, this update can be transparently transferred to other topologies. When working with multiple data centers, it is important to ensure that if one data center goes down, another data center is fully capable of assuming its load and data. The GridGain Enterprise Edition Data Center Replication feature solves precisely this challenge.
When data center replication is turned on, the GridGain Enterprise Edition in-memory database will automatically ensure that each data center is consistently backing up its data to other data centers. The GridGain Enterprise Edition Data Center Replication feature supports both active-active and active-passive modes for replication. Data center replication is not available in the core Apache Ignite platform.
GridGain Data Center Replication Features
The GridGain Enterprise Edition Data Center Replication solution addresses the need for high availability across data centers with a variety of product features.
Batching and Asynchrony
In GridGain, all updates occurring in a data center are accumulated and asynchronously sent to other data centers in batches. Batching happens in two places - on each node and on the sender hub. Updates in the sender node are not sent to the sender hub immediately. Instead, they are accumulated in batches. Once a batch is too big or too old, it is sent to the sender hub. The sender hub places these batches in a queue and sends them over to the receiver hub whenever it (the receiver) is ready, seamlessly replicating data from one cluster to another.
Filtering for Selective Cache Data Updates
There may be cases when you do not want to replicate all the data from one cluster to another. The GridGain Data Center Replication feature offers an option to define filters for cache updates. Users can decide what data to replicate on a per-entry basis by implementing custom filters. Updates that do not pass the filter will not be replicated to other data centers.
Guaranteed Replication Batch Delivery
GridGain supports various failover methods to guarantee replication batch delivery to all target receiver data centers. In the case where a sender cache sent a replication batch to a receiver hub and the receiver hub went down before acknowledging this batch, the sender cache will automatically resend this batch to another available receiver hub. GridGain also provides an option to save batches pending delivery in persistent store, thus surviving receiver hub restart. In the case where a sender hub sent a replication batch to a receiver hub and the receiver hub went down before acknowledging the batch, the sender hub will automatically resend the batch to another receiver hub in the same remote data center.
Support for Complex Topologies (up to 31 Data Centers)
GridGain supports up to 31 data centers participating in Data Center Replication with virtually any links between them. You can have data center A replicating updates to several other data centers. You can have data center A replicating to data center B, and then data center B forwarding these updates to data center C. You can have data centers A, B and C replicating updates to each other, and so on. You can filter some updates on sender hubs to avoid loops in such complex topologies. For example, you can configure the system so updates that occur in data center A should not be replicated from data center B to data center C, while continuing to replicate updates that occur in data center B to data center C.
Automatically or Manually Pause/Resume Replication
Replication for a particular sender cache can be paused either manually or automatically. An automatic replication pause occurs when GridGain detects that data cannot be replicated for any reason. This can happen when there are no available sender hubs or none of the sender hubs can process the replication batch (e.g., if persistent store is full or corrupted). Manual pause/resume can be performed through Data Center Replication APIs or management console.
Smart Conflict Resolution
In active-active scenarios, when updates to the same cache occur in different data centers, the system refers to updates to the same key from different data centers as a conflict. If a conflict is detected, the user is given an option to update the new value, retain the old one, or produce some other value merging the old and new values. To ensure consistent conflict resolution across different data centers, users can insert their own implementation of conflict resolver.