GridGain Developers Hub

Rolling Upgrades

Rolling upgrade is a GridGain Enterprise and Ultimate Edition feature that allows nodes with different versions of GridGain to co-exist in one cluster while you roll out new version. This prevents any downtime when performing software upgrades.

With rolling upgrades enabled, you can have a new node with a newer version of GridGain join the cluster. For example, if you have ten nodes running on ten physical servers, you can start a new node with a new version of GridGain on any of the servers. The data is rebalanced to accommodate this new member of the cluster. Once the rebalancing of data is complete, you can shut down the old node. And repeat this process for all server nodes. In this case, the cluster is upgraded one node at a time, without any downtime.

Guidelines and Incompatible Versions

Before you start to upgrade to a newer version of GridGain, please note the following:

  • Upgrades are supported for minor and maintenance versions of the same major series only. The feature is not supported for upgrades between major versions due to the changes that might not be backward-compatible. For instance, it is possible to upgrade from 8.5.6 to 8.7.5 without downtime, but rolling upgrades will not work from version 7.9.1 to 8.7.5. For major version upgrades, contact GridGain for a possible zero-downtime solution.

  • There is always a way to upgrade to the latest GridGain version from any version of the same major series without downtime. In most cases, it is a direct one-step upgrade. For instance, you can upgrade from 8.5.6 to 8.7.5 with rolling upgrades, avoiding production outages.

  • Suppose there are any breaking changes between minor or maintenance versions of the same series. In that case, the upgrade involves a transitioning version: moving from version "A" to version "B" via "C" without production outages. For instance, if you need to upgrade from 8.1.3 to 8.8.1, then 8.5.3 has to be used as a transitioning version: 8.1.3 → 8.5.3 → 8.8.1.

Enabling Rolling Upgrades

Rolling upgrades are disabled by default. If you plan to upgrade GridGain without stopping the whole cluster, you need to enable the feature. Use the configuration examples below to enable the rolling upgrades:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">

    <property name="pluginConfigurations">
        <list>
            <bean class="org.gridgain.grid.configuration.GridGainConfiguration">
                <property name="rollingUpdatesEnabled" value="true"/>
            </bean>
        </list>
    </property>

</bean>
GridGainConfiguration pluginCfg = new GridGainConfiguration();
pluginCfg.setRollingUpdatesEnabled(true);

IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setPluginConfigurations(pluginCfg);

Ignite ignite = Ignition.start(cfg);
var cfg = new GridGainPluginConfiguration()
{
    RollingUpdatesEnabled = true
};
This API is not presently available for C++. You can use XML configuration.

Upgrading to a New GridGain Version

If you have a multi-node cluster and you want to upgrade the cluster to a new GridGain version without stopping the whole cluster, follow the steps below:

  1. Download the new GridGain version.

  2. If you are using persistence: Configure storagePath, walPath and walArchivePath properties. If these properties are explicitly configured in the current version of GridGain you are using, then provide the same value of these properties to the new GridGain version’s configuration. If these properties are not set, provide the default location used by these properties.

  3. Stop one running node.

  4. Restart the node with the new version. At this point, the cluster will start to rebalance.

  5. Wait while the rebalancing is going on. You can monitor the logs for something like this:

    Rebalancing scheduled [order=[Cache1, Cache2, Cache3], top=AffinityTopologyVersion [topVer=59, minorTopVer=1], force=false, evt=DISCOVERY_CUSTOM_EVT, node=79794392-5518-4609-ac13]
    ...
    ...
    ...
    Completed (final) rebalancing [fromNode=node=79794392-5518-4609-ac13, cacheOrGroup=Cache2, topology=AffinityTopologyVersion [topVer=59, minorTopVer=1], time=370550 ms]

    After this, there is a partition map exchange for the new minor topology change, which is caused by Late Affinity Assignment. So, instead of checking the rebalancing for all the caches, you can look for a message about the exchange:

    Finish exchange future [startVer=AffinityTopologyVersion [topVer=42, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=42, minorTopVer=1], err=null]

    Alternatively, if rebalancing is not needed, it is skipped:

    Skipping rebalancing (no affinity changes) [top=AffinityTopologyVersion [topVer=42, minorTopVer=1], rebTopVer=AffinityTopologyVersion [topVer=42, minorTopVer=0], evt=DISCOVERY_CUSTOM_EVT, evtNode=38640608-5518-4609-ac13-0b35eea1d6cb, client=false]
  6. After rebalancing is complete, repeat steps 1-6 for all nodes.

Performance Considerations

When rolling upgrades are enabled, every network message is appended with rolling upgrade data. In most situations, the overhead is small compared to the overall throughput and has almost no performance impact. This includes workloads dealing with medium- or large-sized entries (1 KB and larger), as well as compute-intensive applications.

The impact can get more noticeable if the system handles a comparatively larger number of smaller entries and messages. Using batching techniques (for example, data streamer or putAll method) allows you to combine entries in larger requests and reduce the overhead ratio.

Update Duration

The duration of the rolling upgrade procedure is largely affected by the speed of the rebalancing process. By default, rebalancing is performed in one thread. However, you can increase the number of threads dedicated to rebalancing to enhance performance. The rebalancing thread pool size is controlled by the IgniteConfiguration.rebalanceThreadPoolSize property. See the Configuring Rebalance Thread Pool page for more information.

You can change the value of the thread pool size in a manner similar to the rolling upgrade procedure. If you have already upgraded to version 8.7.x where x ≥ 4, you can stop an individual node, set the parameter, and start the node, repeating this action for each node in the cluster.

Monitoring Rolling Upgrades

GridGain Control Center allows you to monitor the process of rolling upgrades as you move to a newer version of GridGain.

The Rebalance widget in Control Center displays the cluster nodes and their versions. As you perform node-by-node migration to a newer version, watch the status of the rebalancing progress and proceed to the next node when it is finished.

Performing a Rolling Restart

If you do not want to upgrade to a new version of GridGain but would like to make some changes to the server nodes instead — for example, modifying the configuration file — you can do a rolling restart.

To do a rolling restart, perform the following steps:

  1. Stop one node in the cluster.

  2. Restart the node with the new configuration file. At this point, data rebalancing starts to accommodate this node into the cluster.

  3. Wait until data rebalancing is complete.

  4. Repeat steps 1-3 until all the nodes are upgraded to the new configuration.

Managed Rolling Upgrade

Managed rolling upgrade enables you to keep the upgrade process under control. The regular rolling upgrade allows you to connect nodes with a different version at any time and automatically activate new features when all nodes support it. Suppose that you have two nodes with different versions where the new node has a new feature (for example, a changed messaging protocol) and the old node does not support it. This feature is active immediately when the last unsupported node leaves the cluster. As a result, the old node can no longer connect to the cluster.

Managed Rolling Upgrade allows you to control the upgrade process, making it predictable and accurate. It is based on the following constraints:

  • The cluster does not allow nodes of different versions to join until upgrade mode is enabled.

  • When upgrade mode is enabled, you can connect another node with only one different larger version than the current one.

  • The new features are not activated until the upgrade mode is turned off manually.

Activating Rolling Upgrade

To activate the rolling upgrade mode, run the following command:

control.sh --rolling-upgrade start --yes
control.bat --rolling-upgrade start --yes

Checking Rolling Upgrade status

To check the Rolling Upgrade mode status, run the following command:

control.sh --rolling-upgrade status
control.bat --rolling-upgrade status

Disabling Rolling Upgrade

To stop Rolling Upgrade, perform the following command:

control.sh --rolling-upgrade finish
control.bat --rolling-upgrade finish

Example: Using Managed Rolling Upgrade for Migration

Managed rolling upgrade enables you to make the migration process manageable and predictable. The below example demonstrates how to use managed rolling upgrade to migrate from one version to another. To migrate from version 8.8.0 to 8.8.1, perform the following steps:

  1. Check the current status of the rolling upgrade:

control.sh --rolling-upgrade status
control.bat --rolling-upgrade status

Currently, the initial version corresponds to the current version of the cluster, and the rolling upgrade is disabled. If you run a new version, the joining node throws an exception, as shown below:

The local node's build version differs from the remote node's [locBuildVer=8.7.23#20200728-sha1:4e015639, rmtBuildVer=8.8.0#20200714-sha1:35c405b6].
The node has been rejected due to the disabled Rolling Upgrade.
Please consider enabling it via control.sh utility with the following command: `--rolling-upgrade start` or using GridGain.rollingUpgrade().start() method.
  1. Activate rolling upgrade:

    control.sh --rolling-upgrade start --yes
    control.bat --rolling-upgrade start --yes
  2. Check the rolling upgrade status:

    control.sh --rolling-upgrade status
    control.bat --rolling-upgrade status
  3. When a new node connects to a new cluster, its version is saved in the target version, which you can verify by checking the status again:

    Rolling upgrade is enabled
    Initial version: 8.8.0#20200714-sha1:35c405b6
    Target version: 8.8.1#20200728-sha1:4e015639
    List of alive nodes in the cluster that are not updated yet: 54a088fb
    List of alive nodes in the cluster that are updated: 01110503
    Rolling upgrade operation failed. Rolling Upgrade mode cannot be disabled.
    Rolling upgrade cannot be disabled because there is more than one node with a
    different product version.
    
    8.8.0#20200714-sha1:35c405b6=Nodes: [0:0:0:0:0:0:0:1%lo],
    8.8.1#20200728-sha1:4e015639=Nodes: [0:0:0:0:0:0:0:1%lo]
  4. Once the last old node is released, you can stop the upgrade process:

    control.sh --rolling-upgrade finish
    control.bat --rolling-upgrade finish

Reverting Rolling Upgrade

You can revert Rolling Upgrade while its process is still in progress. Even if the target version is identified, it is yet possible to shut down all nodes with the target version (8.8.1) and disable Rolling Upgrade. In this case, the initial version remains unchanged (8.8.0).

The cluster must not contain node older than the initial [initVer=8.7.23#20200728-sha1:4e015639, nodeVer=8.8.0#20200714-sha1:35c405b6]

Using the “Force” Mode

If you still want to have the old node in the cluster, use the “force” mode (not recommended). To use the force mode, run the following command:

control.sh --rolling-upgrade force
control.bat --rolling-upgrade force

The distributed rolling upgrade is enabled by default in version 8.8.1. If a cluster of version 8.7.x is undergoing a rolling upgrade into a cluster of version 8.8.1 or greater, the cluster enables "force" mode automatically. The condition of whether a feature is supported is calculated according to the old logic, and the nodes of old versions are accepted into the cluster. In this case, you must disable the “force” mode manually after the upgrade.

If a cluster is started on a version that is equal or greater than 8.8.1, the rolling upgrade mode of the started cluster is "disabled". You must change the mode to "enabled" to perform a rolling upgrade.

Using Java API to Manage Rolling Upgrade

Since control.sh is based on the Java API, you can use Java to manage rolling upgrades programmatically. For that purpose, use the following methods:

Method Name Description

start

Activates rolling upgrade mode or throws an exception if it is not possible.

finish

Disables rolling upgrade mode or throws an exception if it is not possible.

force

Switches rolling upgrade to the "force" mode.

getStatus

Returns the state of the rolling upgrade process.

A full description is provided in the GridGainRollingUpgrade class.

The following examples illustrate how to use the Java API to manage rolling upgrades:

/**
* Enables rolling upgrade mode.
*/
public void start();

/**
* Disables rolling upgrade mode.
*/
public void finish();

/**
* Enables forced mode of rolling upgrade. This means that the strict version checking of the node should not be
* used and therefore this mode allows to coexist more than two versions of Ignite nodes in the cluster.
*/
public void force();

/**
* Returns cluster-wide status of Rolling Upgrade process.
*
* @return status of Rolling Upgrade process.
*/
public GridRollingUpgradeStatus getStatus();

The following example illustrates how to use this function in java code:

try (Ignite ig = Ignition.start("your_configuration.xml")) {
GridGain gg = ig.plugin(GridGain.PLUGIN_NAME);
GridGainRollingUpgrade ru = gg.rollingUpgrade();

 GridRollingUpgradeStatus status = ru.getStatus();

 ru.start();
 ru.finish();
 ru.force();
}