Rolling Upgrade
A rolling upgrade moves the cluster from one product version to another while the data plane stays available. Nodes are restarted one at a time on the new binaries; the cluster runs in a mixed-version state for the duration of the upgrade and is promoted to the new version atomically when the operator commits.
Limitations
Cluster Configuration Changes Are Blocked
While an upgrade is in progress, the cluster rejects cluster-wide configuration changes. Local node-level configuration changes are still allowed. Cluster-wide configuration changes are accepted again after the upgrade is committed or cancelled.
Schema Changes and DDL Are Blocked
While an upgrade is in progress, the cluster rejects all schema-modifying DDL: CREATE TABLE, ALTER TABLE, DROP TABLE, index changes, distribution-zone changes, and similar statements. Read and DML operations (SELECT, INSERT, UPDATE, DELETE) continue to work normally. DDL is accepted again after the upgrade is committed or cancelled.
Rolling Upgrade and PITR Are Mutually Exclusive
A rolling upgrade and a point-in-time recovery (PITR) cannot run concurrently. Once a rolling upgrade has started, attempts to start PITR fail with an error naming the conflicting operation, and vice versa. The block is released when the upgrade is committed or cancelled.
Rolling Upgrade Procedure
Phase 1 — Start the Upgrade
Run upgrade start
upgrade start --version <V>
What Happens When You Start
-
The target version is validated. If it is not a valid GridGain version, the command fails immediately.
-
The cluster checks that no other cluster-wide operation (such as PITR) is in progress. If one is, the command fails with an error naming that operation.
-
The cluster records the target version and marks the upgrade as in progress. From this point, cluster-wide configuration changes are rejected.
-
The
upgrade.rollingmetrics are updated:StatebecomesIN_PROGRESS,InitialVersionrecords the cluster’s version at the time of the start, andTargetVersionrecords the target.
The upgrade state is now durable: it survives any node restart, including the restart of any cluster-management node.
Phase 2 — Roll the Nodes
Repeat the platform-specific procedure below for each node in the cluster, one at a time. After each node, confirm it has rejoined at the target version before moving to the next.
ZIP Archive
-
Stop the node.
-
If
work,etc, andlogare inside the install tree, move them outside before deleting the install tree. (If they are already outside the install tree, no action is needed.) -
Replace the contents of the install directory with the unpacked new ZIP.
-
Copy
gridgain.propertiesfrom the new install’setc/directory over the preserved file in youretc/directory, so that the version metadata matches the new binaries. -
Start the node back up using the start script from the new install tree.
-
Run
upgrade stateand confirm the node appears in thenextVersionNodeslist.
DEB / RPM Package
-
Stop the node.
-
Run the standard package-manager upgrade command for your distribution. The package upgrade preserves your existing
vars.envandgridgain-config.conffiles in abackup/subdirectory under the configuration directory and registers the service unit for the new binaries. The persistentworkdirectory is left untouched. -
Restart the GridGain service using
systemctl(or the equivalent for your init system). -
Run
upgrade stateand confirm the node appears in thenextVersionNodeslist.
Docker
-
Stop the container for this node.
-
Update the image tag in your run command, Compose file, or Kubernetes manifest to the new version.
-
Start the container against the same persistent volume that holds
work,etc, andlog. -
Run
upgrade stateand confirm the node appears in thenextVersionNodeslist.
After all nodes are upgraded, proceed to phase 3.
Phase 3 — Complete the Upgrade
Run upgrade commit
upgrade commit
What Happens on Commit
-
The cluster reads the current upgrade state and the list of active nodes.
-
The cluster validates that every active node is running the target version. If any node is still on the source version, the commit fails with an error listing the offending nodes by name and reported version.
-
The cluster validates that every node it knows about has rejoined fully. If a node has not rejoined, the commit fails with an error.
-
The cluster’s recorded version is set to the target version, the in-progress flag is cleared, and cluster-wide configuration changes are accepted again.
-
Cluster-wide features that depend on the new version are activated.
-
The
upgrade.rollingmetricStatebecomesNOT_STARTED.
Monitoring Update Progress
The upgrade state Command
Returns four fields:
-
currentVersion— the cluster’s recorded version (the source version while the upgrade is in progress). -
nextVersion— the target version, ornullwhen no upgrade is in progress. -
nextVersionNodes— names of nodes whose running version equalsnextVersion. -
currentVersionNodes— names of nodes whose running version equalscurrentVersion.
Metrics
The metric source upgrade.rolling exposes the following gauges:
-
State—NOT_STARTEDorIN_PROGRESS. -
InitialVersion— cluster version at the time the upgrade started; empty when no upgrade is in progress. -
TargetVersion— target version; empty when no upgrade is in progress. -
UpgradedNodes— comma-separated list of node names that have reached the target version. -
NotUpgradedNodes— comma-separated list of node names still on the source version.
Recovery and Abort
Cancelling an In-Progress Upgrade
Run upgrade cancel
upgrade cancel
Preconditions for Cancel
The cancel operation validates that every active node is on the source version. If any node is still on the target version (that is, it has been rolled forward to the new binaries), the cancel fails with an error listing the offending nodes. To cancel after binaries have been swapped on some nodes, the operator must first roll those nodes back to the source binaries.
When cancel succeeds, the cluster clears the in-progress flag while leaving the recorded version unchanged, accepts cluster-wide configuration changes again, and releases the block that prevents PITR from starting.
Resuming After a Coordinator or Cluster-Management Node Restart
The upgrade state is durable across cluster restarts and cluster-management leader failover. You can safely restart your CLI session, and any cluster-management node reboot during a roll does not require restarting the upgrade. When a node rejoins after a restart, it picks up the in-progress upgrade state from the cluster and applies the cluster-wide configuration block locally.
Handling Dead or Unresponsive Nodes
Reviving a Node at the Target Version
A node that was upgraded but is currently down does not appear as an active cluster member and is not counted toward upgrade progress. Bring it back up on the target binaries; it will rejoin and upgrade state will surface it under nextVersionNodes.
Evicting an Unrecoverable Node
A node that the cluster knows about but that has not rejoined fully blocks upgrade commit. Remove the node from the cluster following the standard topology-management procedure, then retry commit.
© 2026 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.