GridGain Developers Hub

Rolling Upgrades

Overview

Rolling Upgrades is an Enterprise and Ultimate Edition feature that allows nodes with different versions of GridGain to co-exist in one cluster while you roll out new versions. This prevents any downtime when performing software upgrades.

With Rolling Upgrades enabled, you can have a new node with a newer version of GridGain join the cluster. For example, if you have ten nodes running on ten physical servers, you can start a new node with a new version of GridGain in any one of the servers. The data is rebalanced to accommodate this new member of the cluster. Once the rebalancing of data is complete, you can shut down the old node. Repeat this process for all server nodes. In this case, the cluster is upgraded one node at a time, without any downtime.

Guidelines and Incompatible Versions

Before you start to upgrade to a newer version of GridGain, please note the following:

Upgrades are supported for minor and maintenance versions of the same major series only. The feature is not supported for upgrades between major versions due to changes that might not be backward-compatible. For instance, it is possible to upgrade from 8.5.6 to 8.7.5 without downtime, but Rolling Upgrades will not work from version 7.9.1 to 8.7.5. For major version upgrades, contact GridGain for a possible zero-downtime solution.

There is always a way to upgrade to the latest GridGain version from any version of the same major series without downtime. In most cases, it is a direct one-step upgrade. For instance, you can upgrade from 8.5.6 to 8.7.5 with Rolling Upgrades, avoiding production outages.

Suppose there are any breaking changes between minor or maintenance versions of the same series. In that case, the upgrade will involve a transitioning version: moving from version A to version B via C without production outages. For instance, if you need to upgrade from 8.1.3 to 8.8.1, then 8.5.3 has to be used as a transitioning version: 8.1.3 → 8.5.3 → 8.8.1.

Enabling Rolling Upgrades

Rolling Upgrades are disabled by default for performance reasons. If you plan to upgrade GridGain without stopping the whole cluster, you need to enable the feature. Below is a configuration example for enabling rolling upgrades:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">

    <property name="pluginConfigurations">
        <list>
            <bean class="org.gridgain.grid.configuration.GridGainConfiguration">
                <property name="rollingUpdatesEnabled" value="true"/>
            </bean>
        </list>
    </property>

</bean>
GridGainConfiguration pluginCfg = new GridGainConfiguration();
pluginCfg.setRollingUpdatesEnabled(true);

IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setPluginConfigurations(pluginCfg);

Ignite ignite = Ignition.start(cfg);
var cfg = new GridGainPluginConfiguration()
{
    RollingUpdatesEnabled = true
};
This API is not presently available for C++. You can use XML configuration.

Upgrading to a New GridGain Version

If you have a multi-node cluster and you want to upgrade the cluster to a new GridGain version without stopping the whole cluster, use the following steps:

  1. Download the new GridGain version.

  2. If you are using persistence: Configure storagePath, walPath and walArchivePath properties. If these properties were explicitly configured in the current version of GridGain you are using, then provide the same value of these properties to the new GridGain version’s configuration. If these properties were not set, provide the default location used by these properties.

  3. Stop one running node.

  4. Restart the node with the new version. At this point, the cluster will start to rebalance.

  5. Wait while the rebalancing is going on. You can monitor the logs for something like this:

    Rebalancing scheduled [order=[Cache1, Cache2, Cache3], top=AffinityTopologyVersion [topVer=59, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, node=79794392-5518-4609-ac13]
    ...
    ...
    ...
    Completed (final) rebalancing [fromNode=node=79794392-5518-4609-ac13, cacheOrGroup=Cache2, topology=AffinityTopologyVersion [topVer=59, minorTopVer=1], time=370550 ms]

    After this there will be a partition map exchange for the new minor topology change, which is caused by Late Affinity Assignment. So, instead of checking the rebalancing for all the caches, you can look for a message about the exchange:

    Finish exchange future [startVer=AffinityTopologyVersion [topVer=42, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=42, minorTopVer=1], err=null]

    Alternatively, if rebalancing is not needed, it will be skipped:

    Skipping rebalancing (no affinity changes) [top=AffinityTopologyVersion [topVer=42, minorTopVer=1], rebTopVer=AffinityTopologyVersion [topVer=42, minorTopVer=0], evt=DISCOVERY_CUSTOM_EVT, evtNode=38640608-5518-4609-ac13-0b35eea1d6cb, client=false]
  6. After rebalancing is complete, repeat steps 1-6 for all nodes.

Performance Considerations

The duration of the rolling upgrade procedure is largely affected by the speed of the rebalancing process. By default, rebalancing is performed in one thread. However, you can increase the number of threads dedicated to rebalancing to enhance performance. The rebalancing thread pool size is controlled by the IgniteConfiguration.rebalanceThreadPoolSize property. See Configuring Rebalance Thread Pool for more information.

You can change the value of the thread pool size in a manner similar to the rolling upgrade procedure. If you have already upgraded to version 8.7.x where x ≥ 4, you can stop an individual node, set the parameter, and start the node, repeating this action for each node in the cluster.

Monitoring Rolling Upgrades

GridGain Control Center allows you to monitor the process of Rolling Upgrades as you move to a newer version of GridGain.

The Rebalance widget in Control Center displays the cluster nodes and their versions. As you perform node-by-node migration to a newer version, watch the status of the rebalancing progress and proceed to the next node when it is finished.

Performing a Rolling Restart

If you do not want to upgrade to a new version of GridGain but would like to make some changes to the server nodes instead — for example, modifying the configuration file — you can do a rolling restart.

To do a rolling restart, use the following steps:

  1. Stop one node in the cluster.

  2. Restart the node with the new configuration file. At this point, data rebalancing will start to accommodate this node into the cluster.

  3. Wait until data rebalancing is complete.

  4. Repeat steps 1-3 until all the nodes are upgraded to the new configuration.

Managed Rolling Upgrade

Managed Rolling Upgrade enables you to keep the upgrade process under control. The regular rolling upgrade allows you to connect nodes with a different version at any time and automatically activate new features when all nodes support it. Suppose that you have two nodes with different versions where the new node has a new feature (for example, a changed messaging protocol) and the old node does not support it. This feature will be active immediately when the last unsupported node leaves the cluster. As a result, the old node can no longer connect to the cluster.

Managed Rolling Upgrade allows you to control the upgrade process, making it predictable and accurate. It is based on the following constraints:

  • The cluster does not allow nodes of different versions to join until upgrade mode is enabled.

  • When upgrade mode is enabled, you can connect another node with only one different larger version than the current one.

  • The new features will not be activated until the upgrade mode is turned off manually.

Note: Managed Rolling Upgrade is available in GridGain starting version 8.8.1.

Activating Rolling Upgrade

To activate the Rolling Upgrade mode, run the following command:

./control.sh --rolling-upgrade start --yes
Rolling upgrade mode successfully enabled.

Note: It is recommended that you use the GridGain control.shutility to execute commands.

Checking Rolling Upgrade status

To check the Rolling Upgrade mode status, run the following command:

./control.sh --rolling-upgrade status

Rolling upgrade is enabled
Initial version: 8.8.0#20200714-sha1:35c405b6
Target version: N/A
List of alive nodes in the cluster that are not updated yet: 54a088fb
List of alive nodes in the cluster that are updated:

Disabling Rolling Upgrade

To stop Rolling Upgrade, perform the following command:

./control.sh  --rolling-upgrade finish

Rolling upgrade mode successfully disabled.

Example: Using Managed Rolling Upgrade for migration

Managed Rolling Upgrade enables you to make the migration process manageable and predictable. The below example demonstrates how to use Managed Rolling Upgrade to migrate from one version to another. To migrate from version 8.8.0 to 8.8.1, perform the following steps:

  1. Check the current status of Rolling Upgrade:

./control.sh --rolling-upgrade status

Rolling upgrade is disabled
Initial version: 8.8.0#20200714-sha1:35c405b6
Target version: N/A
22#20200714-sha1:35c405b6
Target version: N/A

Currently, the initial version corresponds to the current version of the cluster, and Rolling Upgrade is disabled. If you run a new version, the joining node will catch an exception, as shown below:

The local node's build version differs from the remote node's
[locBuildVer=8.7.23#20200728-sha1:4e015639, rmtBuildVer=8.8.0#20200714-sha1:35c405b6].
The node has been rejected due to the disabled Rolling Upgrade.
Please consider enabling it via control.sh utility
with the following command: `--rolling-upgrade start` or using
GridGain.rollingUpgrade().start() method.
  1. Activate Rolling Upgrade:

./control.sh --rolling-upgrade start --yes

Rolling upgrade mode successfully enabled.
  1. Check the Rolling Upgrade status:

./control.sh --rolling-upgrade status

Rolling upgrade is enabled
Initial version: 8.8.0#20200714-sha1:35c405b6
Target version: N/A
List of alive nodes in the cluster that are not updated yet: 54a088fb
List of alive nodes in the cluster that are updated:
  1. When a new node connects to a new cluster, its version is saved in the target version, which you can verify by checking the status again:

./control.sh --rolling-upgrade status

Rolling upgrade is enabled
Initial version: 8.8.0#20200714-sha1:35c405b6
Target version: 8.8.1#20200728-sha1:4e015639
List of alive nodes in the cluster that are not updated yet: 54a088fb
List of alive nodes in the cluster that are updated: 01110503

Note: You will not be able to change versions until the update process is complete. If you try to disable a rolling upgrade in an inconsistent state, you will get an error, as shown in the example below:

./control.sh --rolling-upgrade finish --yes

Rolling upgrade operation failed. Rolling Upgrade mode cannot be disabled.
Rolling upgrade cannot be disabled because there is more than one node with a
different product version.

8.8.0#20200714-sha1:35c405b6=Nodes: [0:0:0:0:0:0:0:1%lo],
8.8.1#20200728-sha1:4e015639=Nodes: [0:0:0:0:0:0:0:1%lo]
  1. Once the last old node is released, you can stop the upgrade process:

./control.sh --rolling-upgrade status

Initial version: 8.8.0#20200714-sha1:35c405b6
Target version: 8.8.1#20200728-sha1:4e015639
List of alive nodes in the cluster that are not updated yet:
List of alive nodes in the cluster that are updated: b9feee2e

./control.sh --rolling-upgrade finish

Rolling upgrade mode successfully disabled.

./control.sh --rolling-upgrade status

Rolling upgrade is disabled
Initial version: 8.8.1#20200728-sha1:4e015639
Target version: N/A

Reverting Rolling Upgrade

You can revert Rolling Upgrade while its process is still in progress. Even if the target version is identified, it is yet possible to shut down all nodes with the target version (8.8.1) and disable Rolling Upgrade. In this case, the initial version will remain unchanged (8.8.0).

Note: If you use Ignite persistence, the Rolling Upgrade status is persisted as well. You may shut down the whole cluster during the Rolling Upgrade and resume the process after the restart. After the upgrade is finished, the initial version is changed to the upgrade target version, as illustrated in the example below:

The cluster must not contain node older than the initial [
initVer=8.7.23#20200728-sha1:4e015639, nodeVer=8.8.0#20200714-sha1:35c405b6
]

Using the “force” mode

If you still want to have the old node in the cluster, you should use the “force” mode (not recommended). To use the force mode, run the following command:

./control.sh  --rolling-upgrade force

Rolling upgrade mode successfully enabled.

./control.sh --rolling-upgrade status

Rolling upgrade is enabled
Forced mode is enabled.
Initial version: 8.8.1#20200728-sha1:4e015639
Target version: N/A
List of alive nodes in the cluster that are not updated yet:  b9feee2e
List of alive nodes in the cluster that are updated: none

Note: Using the force mode is not recommended. Trying to join a node with an old version when a cluster is in "force" mode may cause severe errors if a new node already uses a new protocol of communication.

The distributed rolling upgrade is enabled by default in version 8.8.1. If a cluster of version 8.7.30 or less is undergoing a Rolling Upgrade into a cluster of version 8.7.30 or greater, the cluster will enable “force” mode automatically. The condition of whether a feature is supported will be calculated according to the old logic, and the nodes of old versions will be accepted into the cluster. In this case, you must disable the “force” mode manually after the upgrade.

If a cluster is started on a version that is equal or greater than 8.8.1, the Rolling upgrade mode of the started cluster will be "disabled". You must change the mode to "enabled" to perform a Rolling upgrade.

Using Java API to manage Rolling Upgrade

Since control.sh is based on the Java API, you can use Java to manage Rolling Upgrade from code. For that purpose, you use the following three methods:

  • start - activates Rolling Upgrade mode or throws an exception if it is not possible.

  • finish - disables Rolling Upgrade mode or throws an exception if it is not possible.

  • force - switch-on Rolling Upgrade to "force" mode.

  • getStatus - Returns the state of the Rolling Upgrade process

Note: A full description is provided in the GridGainRollingUpgrade class.

The following examples illustrate how to use the Java API to manage Rolling Upgrade:

/**
* Enables rolling upgrade mode.
*/
public void start();

/**
* Disables rolling upgrade mode.
*/
public void finish();

/**
* Enables forced mode of rolling upgrade. This means that the strict version checking of the node should not be
* used and therefore this mode allows to coexist more than two versions of Ignite nodes in the cluster.
*/
public void force();

/**
* Returns cluster-wide status of Rolling Upgrade process.
*
* @return status of Rolling Upgrade process.
*/
public GridRollingUpgradeStatus getStatus();

The following example illustrates how to use this function in java code:

try (Ignite ig = Ignition.start("your_configuration.xml")) {
GridGain gg = ig.plugin(GridGain.PLUGIN_NAME);
GridGainRollingUpgrade ru = gg.rollingUpgrade();

 GridRollingUpgradeStatus status = ru.getStatus();

 ru.start();
 ru.finish();
 ru.force();
}