Upgrades and Rollbacks
The operator can perform rolling upgrades when the image version or cluster configuration changes. It monitors pod health during the upgrade and can automatically roll back if the failure rate exceeds a configurable threshold.
Upgrade Configuration
The upgradeConfig field controls how the operator executes rolling upgrades:
spec:
upgradeConfig:
nodeTimeoutSeconds: 600
clusterTimeoutSeconds: 1800
failureThresholdPercent: 20
rollbackTimeoutSeconds: 1800
maxUnavailable: 1
maxSurge: 0
| Parameter | Default | Description |
|---|---|---|
nodeTimeoutSeconds |
600 |
Maximum time in seconds to wait for a single node to become ready after being restarted. Minimum value is 60. |
clusterTimeoutSeconds |
1800 |
Maximum time in seconds for the entire cluster upgrade to complete. If the upgrade is not finished within this window, the operator triggers a rollback. Minimum value is 300. |
failureThresholdPercent |
20 |
Percentage of pods that can be unhealthy before the operator triggers an automatic rollback. Valid range is 1 to 100. |
rollbackTimeoutSeconds |
1800 |
Maximum time in seconds for a rollback operation to complete. Minimum value is 300. |
maxUnavailable |
1 |
Maximum number of pods that can be unavailable during the upgrade. |
maxSurge |
0 |
Maximum number of extra pods created during the upgrade. For StatefulSets, this is typically 0 or 1. |
Image Upgrades
To upgrade the GridGain version, change the image.tag field and apply the updated manifest:
spec:
image:
tag: "9.1.21"
upgradeConfig:
nodeTimeoutSeconds: 600
clusterTimeoutSeconds: 1800
failureThresholdPercent: 20
maxUnavailable: 1
The operator detects the image change and begins a rolling update. It restarts pods one at a time (controlled by maxUnavailable), waits for each pod to become ready within nodeTimeoutSeconds, and verifies overall cluster health before proceeding to the next pod.
Monitor the upgrade progress:
kubectl get gg9 my-cluster -w
kubectl describe gg9 my-cluster
The cluster’s Phase transitions through Upgrading and back to Running when the upgrade completes successfully.
Configuration-Driven Upgrades
Changes to the following fields are treated as configuration changes and trigger a rolling upgrade when upgradeConfig is defined:
-
spec.image— image version changes -
spec.gridgainConfig— node configuration changes -
spec.clusterConfig— cluster configuration changes -
spec.license— license changes -
spec.extraEnvVars— environment variable changes, including JVM arguments
The operator computes a hash of the configuration and compares it against the lastStableConfigHash stored in the resource status. When the hash changes, the operator adds a config-hash annotation to the pod template, which forces the StatefulSet controller to recreate each pod with the new configuration.
You can change multiple fields at once. All changes are applied in a single rolling upgrade rather than triggering separate upgrades for each field.
Validation Rules During Upgrades
The CRD includes validation rules that prevent modifications to image, clusterConfig, gridgainConfig, license, and extraEnvVars while the cluster is in the Upgrading or RollingBack phase. This prevents conflicting changes from being introduced while an upgrade is in progress. Attempts to modify these fields during an active upgrade will be rejected by the Kubernetes API server.
Automatic Rollback
The operator automatically rolls back to the last known stable state when any of the following conditions occur:
-
The percentage of unhealthy pods exceeds
failureThresholdPercent. -
The total upgrade time exceeds
clusterTimeoutSeconds. -
The cluster fails to reach an active state after the upgrade.
During a rollback, the cluster Phase changes to RollingBack. The operator reverts to the image and configuration hash stored in status.lastStableImage and status.lastStableConfigHash.
To prevent infinite rollback loops, the operator limits rollback attempts to a maximum of three, tracked in status.rollbackCount. The counter resets when the user changes to a different image than the one that caused the failure (tracked in status.lastFailedImage). If all three rollback attempts fail, the cluster enters a Failed phase and requires manual intervention.
Manual Rollback
To manually trigger a rollback at any time, set spec.rollback to true:
kubectl patch gg9 my-cluster --type=merge -p '{"spec":{"rollback":true}}'
The operator reverts the cluster to the last stable configuration. Once the rollback completes, the operator clears the rollback flag automatically.
Monitor the rollback:
kubectl get gg9 my-cluster -w
kubectl get gg9 my-cluster -o jsonpath='{.status.rollbackCount}'
Upgrade Status Fields
The operator tracks upgrade state in several status fields:
| Field | Description |
|---|---|
phase |
Current cluster phase. Values include |
upgradeLocked |
|
upgradeStartTime |
Timestamp when the current upgrade started. |
upgradeStepStartTime |
Timestamp when the current upgrade or rollback step started, used to enforce delays between topology changes. |
upgradeTargetImage |
The image locked in when the current upgrade started. The operator ignores spec changes to |
upgradeTargetConfigHash |
The configuration hash locked in when the current upgrade started. |
lastStableImage |
The image of the last successful deployment, used as the rollback target. |
lastStableConfigHash |
The configuration hash of the last successful deployment. |
lastFailedImage |
The image that caused the most recent rollback, used to reset the rollback counter when the user switches to a different image. |
lastUpgradeSuccessful |
Whether the most recent upgrade completed successfully. |
rollbackCount |
Number of consecutive rollback attempts for the current target, capped at 3. |
© 2026 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.