GridGain 8.5.10 Release Notes
What’s New in This Release
Introduction
This release introduces a number of substantial improvements to the Data Center Replication feature and multiple bugfixes.
Data Center Replication Improvements
Changes In Behavior
-
The default implementation of the sender storage has changed from an in-memory store to a disk-based one. All updates in the master cluster are now stored on disk on the sender nodes before they are transmitted to the remote cluster. The directory on the file system where data updates are stored can be configured using the
DrSenderFsStore.directoryPath
property. The default value points to a directory inside the node’s working directory. However, we recommend that you configure the directory explicitly.<bean class="org.gridgain.grid.configuration.DrSenderConfiguration"> <!-- this node is part of group1 --> <property name="senderGroups"> <list> <value>group1</value> </list> </property> <!-- connection configuration --> <property name="connectionConfiguration"> <bean class="org.gridgain.grid.dr.DrSenderConnectionConfiguration"> <!-- DR storage --> <property name="store"> <bean class="org.gridgain.grid.dr.store.fs.DrSenderFsStore"> <property name="directoryPath" value="/path/to/store"/> </bean> </property> <!-- The ID of the remote cluster --> <property name="dataCenterId" value="2"/> <!-- Addresses of the remote cluster's receiver nodes --> <property name="receiverAddresses"> <list> <value>172.25.4.200:50001</value> </list> </property> </bean> </property> </bean>
-
When the sender store gets full or becomes corrupted, the replication process will stop. You will have to start it manually and do a full state transfer after addressing the problem that caused the issue. Refer to the Managing and Monitoring Replication section.
-
The
GridDr.pause()
/GridDr.resume()
methods were deprecated and replaced withGridDr.stopReplication()
/GridDr.startReplication()
.The
stopReplication()
method terminates the replication process: cache updates will stop being processed. ThestartReplication()
method resumes the replication process, i. e. cache updates will start to be processed again. If during the break the data in the cache has changed, you will need to do a full state transfer.
-
Added capability to pause/resume the replication process on the sender nodes. These operations can be performed via the JMX beans.
Unlike the
GridDr.stopReplication()
methods, the pause operation will not terminate the process; instead, the sender node where the method is called will stop sending the updates to the remote cluster. The updates will continue to be processed and will be accumulcated in the sender storage on that node. To resume the process of sending the updates to the remote cluster, perform the resume operation on the sender node.Refer to the Managing and Monitoring Replication section for the information on how you can pause/resume replication using JMX beans.
-
Added adaptive throttling feature that allows for adjusting the data transfer rate between data nodes and sender nodes, and between sender nodes and receiver nodes depending on the workload.
Managing and Monitoring Replication
This release introduces a number of changes in the JMX beans to support management operations and monitoring.
The following JMX Bean provides information about the replication process of a specific cache. The bean can be obtained on any node that hosts the cache.
- MBean’s ObjectName:
-
group=[cache name],name="Cache data replication"
- Attributes:
-
Attribute Type Description Scope DrSenderGroup
String
The name of the sender group.
Global
DrStatus
String
The status of the replication process for this cache.
Global
DrQueuedKeysCount
int
The number of entries that are waiting to be sent to the sender node.
Node
DrSenderHubsCount
int
The number of sender nodes available for the cache.
Global
- Operations:
-
Operation Description start
Start the replication process for this cache.
stop
Stop the replication process for this cache.
transferTo (id)
Perform a full state transfer of the content of this cache to the given remote cluster ID.
The following MBean can be obtained on a sender node and allows you to pause/resume the replication process on that node.
- MBean’s ObjectName:
-
group="Data Center Replication",name="Sender Hub"
- Operations:
-
Operation Description pause (id)
Pause replication for the given remote cluster ID. This operation will stop sending updated to the remote cluster. The updates will be stored in the sender storage on that node until the replication is resumed.
pause
Pause replication for all remote clusters.
resume (id)
Resume replication for the given remote cluster ID.
resume
Resume replication for all remote clusters.
Events
A number of event types related to data replication were added. The events are described in the following table.
Event Type | Event Description | Where Event Is Fired |
---|---|---|
EVT_DR_REMOTE_DC_NODE_CONNECTED |
A sender node connects to the node in the replica cluster. |
The sender node. |
EVT_DR_REMOTE_DC_NODE_DISCONNECTED |
A sender node loses connection to the node in the remote cluster. The sender will try to connect to another receiver node if more than one are configured. If no receiver is available, the sender will try to reconnect after the period defined in the |
The sender node. |
EVT_DR_CACHE_REPLICATION_STOPPED |
Replication of a specific cache is stopped due to any reason. A full state transfer is required before resuming the replication process. |
All nodes that host the primary partitions of the cache. |
EVT_DR_CACHE_REPLICATION_STARTED |
Replication of a specific cache is started. |
All nodes that host the primary partitions of the cache. |
EVT_DR_CACHE_FST_STARTED |
A full state transfer for a specific cache is started. |
All nodes that host the primary partitions of the cache. |
EVT_DR_CACHE_FST_FAILED |
A full state transfer for a specific cache fails. |
All nodes that host the primary partitions of the cache. |
EVT_DR_STORE_OVERFLOW |
A sender node cannot store entries to the sender storage because the storage is full. Manual restart and a full state transfer may be required after the issue is fixed. |
The sender node. |
EVT_DR_DC_REPLICATION_RESUMED |
Replication to a specific cluster is resumed on a sender node. |
The sender node. |
To listen to replication events, use the following code snippet:
IgniteConfiguration cfg = new IgniteConfiguration();
// Enable the event
cfg.setIncludeEventTypes(org.gridgain.grid.events.EventType.EVT_DR_CACHE_REPLICATION_STOPPED);
// Enable data replication
cfg.setPluginConfigurations(new GridGainConfiguration().setDataCenterId(new Integer(1).byteValue()));
Ignite ignite = Ignition.start(cfg);
IgniteEvents events = ignite.events();
// Local listener that listens to local events.
IgnitePredicate<DrCacheReplicationEvent> localListener = evt -> {
CacheDrPauseReason reason = evt.reason();
System.out.println("Replication stopped. Reason: " + reason);
return true; // Continue listening.
};
// Listen to the "replication stopped" events
events.localListen(localListener, org.gridgain.grid.events.EventType.EVT_DR_CACHE_REPLICATION_STOPPED);
Enabling Replication for Existing Caches
If you have caches that had been created before you configured replication, there is no way to add data replication parameters to those caches, because cache configuration cannot be changed dynamically. To overcome this limitation, we introduced the GG_DR_FORCE_DC_ID
system property.
If you restart the cluster with this property enabled, all caches that do not have data replication parameters will acquire the default replication configuration and will be replicated via the default sender group (named "<default>").
The property must be set to the cluster ID of the current master cluster (the value of GridGainConfiguration.dataCenterId
).
All caches created without data replication configuration in the master cluster with this property enabled, will automatically obtain the default replication configuration and will be replicated through the default sender group.
If you remove the GG_DR_FORCE_DC_ID
property and restart the cluster, the caches without data replication configuration will stop being replicated.
Removed/Deprecated
The GridDr.pause()
and GridDr.resume()
methods in the GridDr
interface have been deprecated.
Use GridDr.stopReplication()
/GridDr.startReplication()
methods instead.
Installation and Upgrade Information
Compatibility with Client Nodes Version 8.4.x
There is a known issue with running server nodes version 8.5.10 and client nodes version 8.4.x in one cluster. We strongly recommend that you upgrade your client nodes to version 8.5.10 first and then upgrade server nodes in a rolling upgrade fashion. If this is not possible, then consider the workaround below:
-
Enable persistence in the default data region in the client nodes' configuration. Perform this procedure for every client application.
-
Upgrade server nodes to version 8.5.10 in a rolling upgrade fashion.
Upgrade Procedure
If you are not using Data Center Replication and want to upgrade without enabling Data Center Replication, refer to the Rolling Upgrades page for information about how to perform automated upgrades.
If you are upgrading from a cluster with Data Center Replication already configured, a regular rolling upgrade should be enough.
If you want to enable Data Center Replication during the upgrade, below is a modified rolling upgrade procedure designed to achieve this goal.
You will need to use the GG_DR_FORCE_DC_ID
system property described in the Enabling Replication for Existing Caches section.
Before attempting to upgrade your master cluster, decide which node will be used as sender nodes and receivers nodes.
Refer to the official documentation on data center replication.
-
Start a replica cluster on version 8.5.10 with Data Center Replication configured.
-
Stop one server node in the master cluster.
-
Upgrade the node to version 8.5.10 and configure it either as a sender node or receiver node. Add the
GG_DR_FORCE_DC_ID=<your cluster id>
set to the ID of the master cluster (GridGainConfiguration.dataCenterId
). -
Repeat steps 2 and 3 for all remaining server nodes in the master cluster.
Known Issues
GG-24425 |
Deactivation of the cluster can lead to an exception being thrown on the sender nodes, which will cause the sender nodes to hang. If this happens, you need to restart the sender nodes. To avoid this, stop the sender nodes first, if possible, before deactivating the cluster. |
GG-24212 |
Log messages about starting the replication process can be misleading. When executing a stop command, you may see the following message: "Data center replication executes a stop task". To monitor the success of start and stop operations, look for the following messages:
|
GG-24136 |
The commands issued using Possible workarounds include:
|
GG-23095 |
Using the now deprecated methods |
GG-24311 |
Creating index for the columns that were changed using ALTER TABLE DROP/ADD COLUMN commands can lead to B+Tree corruption. |
Fixed Issues
GridGain Community Edition Changes
GG-24065 |
Team / SE Common |
Replaced AtomicLong[] with AtomicLongArray in pages list to reduce heap pressure |
GridGain Enterprise Edition Changes
GG-23742 |
DR |
Fixed DR overflow events for REMOVE_OLDEST mode. |
GG-23533 |
Team / Transactions & PME Common |
Added DR_ACK_MISSING_CACHES to allow DR make progress in case there are errors on receiver due to missing caches. |
GG-23342 |
DR |
Added metrics to show DR sender/receiver pending queues size. |
GG-23300 |
DR |
Change default DrStore mode to STOP. |
GG-23238 |
Control.(sh|bat) |
Added data center replication API to the control utility. |
GG-23209 |
DR |
Change default DrStore mode to STOP. |
GG-23044 |
DR |
DR: Add correctly named methods for DR start\stop. |
GG-23041 |
DR |
DR: Fix broken DR pause\resume semantic. |
GG-23036 |
DR |
Prevent possible deadlock in DR during node left and cache stop in the same moment |
GG-23034 |
DR, WC |
Web Console: Added support for alerts on DR events. |
GG-23020 |
DR, SQL |
Fixed data center replication for caches created via DDL. |
GG-23007 |
DR |
Improve logging for the Data Center Replication component. |
GG-22990 |
DR |
Adaptive DR throttling added. |
GG-22983 |
DR |
Bugfix: stop replication for cache that doesn’t exists at the receiver DC. |
GG-22979 |
DR |
Data Replication now uses file-system based implementation of Sender Store by default. It has default work directory, but it is recommended to configure it explicitly on production environments. |
GG-22978 |
DR |
Stop DR replication in case sender hub with in memory store went down. |
GG-22977 |
DR |
New method named 'resetState' has been added to DrSender interface. It resets all internal state of sender hub including clearing store and internal queues. |
GG-22885 |
DR |
Deprecated and ignored GridGainCacheConfiguration.drReceiverEnabled flag due to invalid behavior |
GG-22884 |
DR |
Added events for DR. |
GG-22883 |
DR |
Added DR full state transfer operation to cache MBean. |
GG-22880 |
DR |
Added ability to change full state transfer throttle per node at runtime. |
GG-22879 |
DR |
DR: delete files with successfully sent (ACK is received) data. |
Related Information
We Value Your Feedback
The GridGain documentation team is focused on constantly improving the product documentation. Your comments and suggestions are always welcome. You can reach us here: docs@gridgain.com
Please visit the documentation for more information.
© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.