GridGain Developers Hub

GridGain 8.5.10 Release Notes

What’s New in This Release


This release introduces a number of substantial improvements to the Data Center Replication feature and multiple bugfixes.

Data Center Replication Improvements

Changes In Behavior

  • The default implementation of the sender storage has changed from an in-memory store to a disk-based one. All updates in the master cluster are now stored on disk on the sender nodes before they are transmitted to the remote cluster. The directory on the file system where data updates are stored can be configured using the DrSenderFsStore.directoryPath property. The default value points to a directory inside the node’s working directory. However, we recommend that you configure the directory explicitly.

    <bean class="org.gridgain.grid.configuration.DrSenderConfiguration">
        <!-- this node is part of group1 -->
        <property name="senderGroups">
        <!-- connection configuration -->
        <property name="connectionConfiguration">
            <bean class="org.gridgain.grid.dr.DrSenderConnectionConfiguration">
                <!-- DR storage -->
                <property name="store">
                    <bean class="">
                        <property name="directoryPath" value="/path/to/store"/>
                <!-- The ID of the remote cluster -->
                <property name="dataCenterId" value="2"/>
                <!-- Addresses of the remote cluster's receiver nodes -->
                <property name="receiverAddresses">
  • When the sender store gets full or becomes corrupted, the replication process will stop. You will have to start it manually and do a full state transfer after addressing the problem that caused the issue. Refer to the Managing and Monitoring Replication section.

  • The GridDr.pause()/GridDr.resume() methods were deprecated and replaced with GridDr.stopReplication()/GridDr.startReplication().

    The stopReplication() method terminates the replication process: cache updates will stop being processed. The startReplication() method resumes the replication process, i. e. cache updates will start to be processed again. If during the break the data in the cache has changed, you will need to do a full state transfer.

  • Added capability to pause/resume the replication process on the sender nodes. These operations can be performed via the JMX beans.

    Unlike the GridDr.stopReplication() methods, the pause operation will not terminate the process; instead, the sender node where the method is called will stop sending the updates to the remote cluster. The updates will continue to be processed and will be accumulcated in the sender storage on that node. To resume the process of sending the updates to the remote cluster, perform the resume operation on the sender node.

    Refer to the Managing and Monitoring Replication section for the information on how you can pause/resume replication using JMX beans.

  • Added adaptive throttling feature that allows for adjusting the data transfer rate between data nodes and sender nodes, and between sender nodes and receiver nodes depending on the workload.

Managing and Monitoring Replication

This release introduces a number of changes in the JMX beans to support management operations and monitoring.

The following JMX Bean provides information about the replication process of a specific cache. The bean can be obtained on any node that hosts the cache.

MBean’s ObjectName:

group=[cache name],name="Cache data replication"

Attribute Type Description Scope



The name of the sender group.




The status of the replication process for this cache.




The number of entries that are waiting to be sent to the sender node.




The number of sender nodes available for the cache.


Operation Description


Start the replication process for this cache.


Stop the replication process for this cache.

transferTo (id)

Perform a full state transfer of the content of this cache to the given remote cluster ID.

The following MBean can be obtained on a sender node and allows you to pause/resume the replication process on that node.

MBean’s ObjectName:

group="Data Center Replication",name="Sender Hub"

Operation Description

pause (id)

Pause replication for the given remote cluster ID. This operation will stop sending updated to the remote cluster. The updates will be stored in the sender storage on that node until the replication is resumed.


Pause replication for all remote clusters.

resume (id)

Resume replication for the given remote cluster ID.


Resume replication for all remote clusters.


A number of event types related to data replication were added. The events are described in the following table.

Event Type Event Description Where Event Is Fired


A sender node connects to the node in the replica cluster.

The sender node.


A sender node loses connection to the node in the remote cluster. The sender will try to connect to another receiver node if more than one are configured. If no receiver is available, the sender will try to reconnect after the period defined in the DrSenderConfiguration.reconnectOnFailureTimeout property.

The sender node.


Replication of a specific cache is stopped due to any reason. A full state transfer is required before resuming the replication process.

All nodes that host the primary partitions of the cache.


Replication of a specific cache is started.

All nodes that host the primary partitions of the cache.


A full state transfer for a specific cache is started.

All nodes that host the primary partitions of the cache.


A full state transfer for a specific cache fails.

All nodes that host the primary partitions of the cache.


A sender node cannot store entries to the sender storage because the storage is full. Manual restart and a full state transfer may be required after the issue is fixed.

The sender node.


Replication to a specific cluster is resumed on a sender node.

The sender node.

To listen to replication events, use the following code snippet:

IgniteConfiguration cfg = new IgniteConfiguration();

// Enable the event

// Enable data replication
cfg.setPluginConfigurations(new GridGainConfiguration().setDataCenterId(new Integer(1).byteValue()));

Ignite ignite = Ignition.start(cfg);

IgniteEvents events =;

// Local listener that listens to local events.
IgnitePredicate<DrCacheReplicationEvent> localListener = evt -> {

    CacheDrPauseReason reason = evt.reason();

    System.out.println("Replication stopped. Reason: " + reason);
    return true; // Continue listening.

// Listen to the "replication stopped" events

Enabling Replication for Existing Caches

If you have caches that had been created before you configured replication, there is no way to add data replication parameters to those caches, because cache configuration cannot be changed dynamically. To overcome this limitation, we introduced the GG_DR_FORCE_DC_ID system property.

If you restart the cluster with this property enabled, all caches that do not have data replication parameters will acquire the default replication configuration and will be replicated via the default sender group (named "<default>").

The property must be set to the cluster ID of the current master cluster (the value of GridGainConfiguration.dataCenterId).

All caches created without data replication configuration in the master cluster with this property enabled, will automatically obtain the default replication configuration and will be replicated through the default sender group.

If you remove the GG_DR_FORCE_DC_ID property and restart the cluster, the caches without data replication configuration will stop being replicated.


The GridDr.pause() and GridDr.resume() methods in the GridDr interface have been deprecated. Use GridDr.stopReplication()/GridDr.startReplication() methods instead.

Installation and Upgrade Information

Compatibility with Client Nodes Version 8.4.x

There is a known issue with running server nodes version 8.5.10 and client nodes version 8.4.x in one cluster. We strongly recommend that you upgrade your client nodes to version 8.5.10 first and then upgrade server nodes in a rolling upgrade fashion. If this is not possible, then consider the workaround below:

  • Enable persistence in the default data region in the client nodes' configuration. Perform this procedure for every client application.

  • Upgrade server nodes to version 8.5.10 in a rolling upgrade fashion.

Upgrade Procedure

If you are not using Data Center Replication and want to upgrade without enabling Data Center Replication, refer to the Rolling Upgrades page for information about how to perform automated upgrades.

If you are upgrading from a cluster with Data Center Replication already configured, a regular rolling upgrade should be enough.

If you want to enable Data Center Replication during the upgrade, below is a modified rolling upgrade procedure designed to achieve this goal. You will need to use the GG_DR_FORCE_DC_ID system property described in the Enabling Replication for Existing Caches section. Before attempting to upgrade your master cluster, decide which node will be used as sender nodes and receivers nodes. Refer to the official documentation on data center replication.

  1. Start a replica cluster on version 8.5.10 with Data Center Replication configured.

  2. Stop one server node in the master cluster.

  3. Upgrade the node to version 8.5.10 and configure it either as a sender node or receiver node. Add the GG_DR_FORCE_DC_ID=<your cluster id> set to the ID of the master cluster (GridGainConfiguration.dataCenterId).

  4. Repeat steps 2 and 3 for all remaining server nodes in the master cluster.

Known Issues


Deactivation of the cluster can lead to an exception being thrown on the sender nodes, which will cause the sender nodes to hang. If this happens, you need to restart the sender nodes. To avoid this, stop the sender nodes first, if possible, before deactivating the cluster.


Log messages about starting the replication process can be misleading. When executing a stop command, you may see the following message: "Data center replication executes a stop task".

To monitor the success of start and stop operations, look for the following messages:

  • "Data center replication is stopped …​"

  • "Data center replication is started …​"


The commands issued using script can throw exceptions on the client nodes that run GridGain prior to version 8.5.10.

Possible workarounds include:

  • Use the JMX Beans to manage replication.

  • Use the Java API to manage replication.

  • Update the clients to GridGain version 8.5.10.


Using the now deprecated methods pause() and resume() can lead to data not being replicated. Use the startReplication()/stopReplication() methods instead.


Creating index for the columns that were changed using ALTER TABLE DROP/ADD COLUMN commands can lead to B+Tree corruption.

Fixed Issues

GridGain Community Edition Changes


Team / SE Common

Replaced AtomicLong[] with AtomicLongArray in pages list to reduce heap pressure

GridGain Enterprise Edition Changes



Fixed DR overflow events for REMOVE_OLDEST mode.


Team / Transactions & PME Common

Added DR_ACK_MISSING_CACHES to allow DR make progress in case there are errors on receiver due to missing caches.



Added metrics to show DR sender/receiver pending queues size.



Change default DrStore mode to STOP.



Added data center replication API to the control utility.



Change default DrStore mode to STOP.



DR: Add correctly named methods for DR start\stop.



DR: Fix broken DR pause\resume semantic.



Prevent possible deadlock in DR during node left and cache stop in the same moment



Web Console: Added support for alerts on DR events.



Fixed data center replication for caches created via DDL.



Improve logging for the Data Center Replication component.



Adaptive DR throttling added.



Bugfix: stop replication for cache that doesn’t exists at the receiver DC.



Data Replication now uses file-system based implementation of Sender Store by default. It has default work directory, but it is recommended to configure it explicitly on production environments.



Stop DR replication in case sender hub with in memory store went down.



New method named 'resetState' has been added to DrSender interface. It resets all internal state of sender hub including clearing store and internal queues.



Deprecated and ignored GridGainCacheConfiguration.drReceiverEnabled flag due to invalid behavior



Added events for DR.



Added DR full state transfer operation to cache MBean.



Added ability to change full state transfer throttle per node at runtime.



DR: delete files with successfully sent (ACK is received) data.

We Value Your Feedback

The GridGain documentation team is focused on constantly improving the product documentation. Your comments and suggestions are always welcome. You can reach us here:

Please visit the documentation for more information.