GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

Introduction

Overview

When working with multiple data centers, it is often important to make sure that if one data center goes down, another data center is fully capable of picking its load and data. Data center replication is meant to solve exactly this problem. When data center replication is turned on, GridGain will automatically make sure that each cluster is consistently synchronizing its data to other data centers (there can be one or more). GridGain supports both active-active and active-passive modes for replication.

Data Center Replication is a feature that replicates data. More specifically, the content of caches. You need to enable cache replication in the cache configuration for each cache that you want to replicate.

Data Replication Modes

You can configure active-passive and active-active replication modes.

In the active-passive mode, one cluster (the master) is receiving user request, and the other cluster (the replica) is used for redundancy and failover purposes.

In the active-active mode, both clusters handle user requests independently and propagate changes to each other. In this mode, care should be taken to resolve conflicts between updates that happen in one cluster and are coming in from the other cluster. Refer to the Conflict Resolution section for details.

How Data Replication Works

Let’s consider the simplest scenario where data from one cluster (the master cluster) is replicated to another cluster (the replica) in active-passive mode. In the master cluster, a group of nodes (called a sender group) are configured to connect to the remote cluster, and all data is transferred through this connection. You can have as many sender groups as you need. For example, you can configure different groups to replicate data to different clusters or you can replicate different caches through the groups with different data transfer properties.

In the replica cluster, there is a subset of nodes (called receivers) that are configured to receive data from the master cluster.

Active-passive replication

Data Replication replicates the content of caches. It does not copy cluster or cache configuration. It means that the replica cluster has to be configured manually and can have a different topology and settings.

When a cache entry is updated in the master cluster — this happens on the node that hosts the entry (the primary node) — changes are not immediately sent to the replica; instead, they are accumulated into batches on the primary nodes. When the batch reaches its maximum size or if it has been waiting for too long, it is sent to one of the sender nodes and transferred to the replica.

Supported Scenarios

GridGain can support replication between up to 31 clusters that can be interconnected in all possible combinations.

The active-passive mode is the most simple mode. Data is replicated from one cluster to another.

In the active-active mode, each cluster has both roles — master and replica — and must be configured to both send and receive data.

These two modes can be scaled to up to 31 clusters. For example, you can have cluster A replicating updates to several other clusters; or cluster A replicating to cluster B, and cluster B replicating to cluster C; or clusters A, B, and C replicating updates to each other.

Known Issues

  • The IgniteCache.clear() operation does not affect remote clusters when using Data Center Replication.

  • Data replication may not work properly for the caches that load data using Data Streamers that disallow overwriting existing keys. If you are going to use data replication with data streamers, set the allowOverwrite property of the data streamer to true.

  • Data Center Replication should not be used with caches that have access-based expiry policy.

  • Cache Interceptors are not invoked in the remote cluster when it receives updates from the master cluster.

  • Deactivation of the cluster can lead to an exception being thrown on the sender nodes, which will cause the sender nodes to hang. If this happens, you need to restart the sender nodes. To avoid this, stop the sender nodes first if possible befor deactivating the cluster. We recommend using client nodes as senders to avoid this issue.