Installing on Kubernetes

You can install GridGain 9 and run a GridGain cluster on Kubernetes cluster. This section describes all the necessary steps, as well as provides the configurations and manifests that you can copy and paste into your environment.

Prerequisites

Recommended Kubernetes Version

GridGain is tested on Kubernetes version 1.26.

Installation Steps

Create ConfigMaps

Create the GridGain configuration file and get a license. The minimum node configuration is as follows:

gridgain-config.conf

ignite: {
  network: {
    # GridGain 9 node port
    port = 3344
    nodeFinder = {
      netClusterNodes = [
        # Kubernetes service to access the GridGain 9 cluster on the Kubernetes network
        "gridgain-svc-headless:3344"
      ]
    }
  }

  storage: {
    profiles = [
      {
        engine = "aipersist"
        name = "default"
        replacementMode = "CLOCK"
        # Explicit storage size configuration
        sizeBytes = 2147483648
      }
    ]
  }
}

Place your license content in the license.conf file.

Create the ConfigMap object for GridGain configuration:

kubectl create configmap gridgain-config -n <namespace> --from-file=gridgain-config.conf

Create the ConfigMap object for the GridGain license:
```
kubectl create configmap gridgain-license -n <namespace> --from-file=license.conf
```
Replace <namespace> with the name of the namespace where you want to deploy GridGain.

In Kubernetes deployments, the gridgain-config.conf file is mounted as a read-only ConfigMap, so any attempt to update it with the node config update command will fail.

To update GridGain node configuration, modify the existing ConfigMap and restart all GridGain pods.
- Modify previously configured ConfigMap object:
  kubectl edit configmap gridgain-config -n <namespace>
- Restart GridGain pod, repeat for every pod:
  kubectl delete pod <GridGain pode name> -n <namespace>

Configure Environment Variables and Logging

In Kubernetes deployments all environment variables must be defined directly in the container specification, either in the StatefulSet manifests or in the Helm chart configuration.

JVM and Memory Configuration

To configure JVM options (such as heap size, Metaspace, and Java agent), define the GRIDGAIN9_EXTRA_JVM_ARGS environment variable in your manifests.

For example, in a StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gridgain-cluster
  namespace: <namespace>
spec:
  ...
  template:
    spec:
      terminationGracePeriodSeconds: 60000
      containers:
        - name: gridgain-node
          env:
            # Must be specified to ensure that GridGain 9 cluster replicas are visible to each other.
            - name: GRIDGAIN9_EXTRA_JVM_ARGS
              value: "-javaagent:/agent/jmx.jar=9404:/opt/jmx/jmx.yaml -Xms1g -Xmx3g -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=4g"

When using the Helm chart, define the same variable through the extraEnvVars field:

extraEnvVars:
  - name: GRIDGAIN9_EXTRA_JVM_ARGS
    value: "-javaagent:/agent/jmx.jar=9404:/opt/jmx/jmx.yaml -Xms1g -Xmx3g -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=4g"

To generate JVM diagnostic files inside Kubernetes/Docker, pass the relevant JVM options through GRIDGAIN9_EXTRA_JVM_ARGS:

env:
  - name: GRIDGAIN9_EXTRA_JVM_ARGS
    value: "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/gridgain/work/diagnostics -XX:ErrorFile=/opt/gridgain/work/diagnostics/hs_err_pid%p.log -Xlog:gc*:file=/opt/gridgain/work/log/gc.log:time,level,tags"

Ensure that the output directories exist and are writable inside the pod. Declare persistent volumes for diagnostic outputs and mount them into the container:

volumeMounts:
  - name: diagnostics
    mountPath: /opt/gridgain/work/diagnostics
  - name: logs
    mountPath: /opt/gridgain/work/log

volumes:
  - name: diagnostics
    persistentVolumeClaim:
      claimName: gg9-diagnostics-pvc
  - name: logs
    persistentVolumeClaim:
      claimName: gg9-logs-pvc

Logging Configuration

By default, the Docker image ships with a gridgain.java.util.logging.properties configuration that enables only console logging.

To modify the default logging behavior (e.g., enabling file-based logging or using log4j2), create a ConfigMap with a custom gridgain.java.util.logging.properties file and mount it into each pod.

Create the ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: gg9-gridgain9-logging
  namespace: "yournamespace"
data:
  gridgain.java.util.logging.properties: |-
    handlers=java.util.logging.FileHandler
    java.util.logging.SimpleFormatter.format = [%1$tF %1$tT] [%4$s] %5$s%6$s%n
    java.util.logging.FileHandler.pattern = /opt/gridgain/etc/gridgain9-%u-%g.log
    java.util.logging.FileHandler.formatter = org.apache.ignite.internal.lang.JavaLoggerFormatter
    java.util.logging.FileHandler.level = WARNING
    java.util.logging.FileHandler.encoding = UTF-8

Mount it in the StatefulSet:

volumeMounts:
  - name: logging
    mountPath: /opt/gridgain/etc/gridgain.java.util.logging.properties
    subPath: gridgain.java.util.logging.properties
...
volumes:
  - name: logging
    configMap:
      defaultMode: 420
      name: gg9-gridgain9-logging

If you use Helm, the same can be defined via configMaps in values.yaml:

configMaps:
  logging:
    name: gridgain.java.util.logging.properties
    path: /opt/gridgain/etc/gridgain.java.util.logging.properties
    subpath: gridgain.java.util.logging.properties
    content: |
      handlers=java.util.logging.FileHandler
      java.util.logging.SimpleFormatter.format = [%1$tF %1$tT] [%4$s] %5$s%6$s%n
      java.util.logging.FileHandler.pattern = /opt/gridgain/etc/gridgain9-%u-%g.log
      java.util.logging.FileHandler.formatter = org.apache.ignite.internal.lang.JavaLoggerFormatter
      java.util.logging.FileHandler.level = WARNING
      java.util.logging.FileHandler.encoding = UTF-8

Once both are configured, directory—file logging becomes active on startup.

Default Storage Configuration

In Kubernetes deployments, the default storage profile is defined in the ConfigMap created during installation.

You can use this configuration:

storage: {
  profiles = [
    {
      engine = "aipersist"
      name = "default"
      replacementMode = "CLOCK"
      sizeBytes = 2147483648
    }
  ]
}

Or see the example on how to redefine it for the Helm chart.

Create and Deploy the Service

Depending on your requirements, define and deploy a Kubernetes service. Gridgain 9 use two types of services: one for internal cluster discovery, and the other — for external client access.

First, choose a type of service you need and prepare the service.yaml file.

For communication inside the Kubernetes cluster, Use a headless service by setting the clusterIP parameter to None. This will expose each pod’s IP, enabling GridGain to be partition‑aware: clients discover every node’s address, determine which partition resides on which node, and send requests directly where the data is located.

service.yaml

apiVersion: v1
kind: Service
metadata:
  # The name must be equal to netClusterNodes.
  name: gridgain-svc-headless
  # Place your namespace name here.
  namespace: <namespace>
spec:
  clusterIP: None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: management
    port: 10300
    protocol: TCP
    targetPort: 10300
  - name: rest
    port: 10800
    protocol: TCP
    targetPort: 10800
  - name: cluster
    port: 3344
    protocol: TCP
    targetPort: 3344
  selector:
    # Must be equal to the label set for pods.
    app: gridgain
  # Include not-yet-ready nodes.
  publishNotReadyAddresses: True
  sessionAffinity: None
  type: ClusterIP

Use a LoadBalancer service to allow external clients to connect. Keep in mind, that with this option you giving up partition awareness.

If your environments does not support LoadBalancer, you can use type: NodePort instead. Refer to the Kubernetes documentation for details.

apiVersion: v1
kind: Service
metadata:
  name: gridgain-loadbalancer
  labels:
    app: gridgain
spec:
  type: LoadBalancer
  selector:
    app: gridgain
  ports:
    - name: rest
      protocol: TCP
      port: 10800
      targetPort: 10800
    - name: client
      port: 10300
      protocol: TCP
      targetPort: 10300

Then apply the service.yaml file to set up this service:

kubectl apply -f service.yaml

Deploy the StatefulSet

Prepare the statefulset.yaml file for StatefulSet deployment:

statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  # The cluster name.
  name: gridgain-cluster
  # Place your namespace name.
  namespace: <namespace>
spec:
  # The initial number of pods to be started by Kubernetes.
  replicas: 2
  # Kubernetes service to access the GridGain 9 cluster on the Kubernetes network.
  serviceName: gridgain-svc-headless
  selector:
    matchLabels:
      app: gridgain
  template:
    metadata:
      labels:
        app: gridgain
    spec:
      terminationGracePeriodSeconds: 60000
      containers:
        # Custom pod name.
      - name: gridgain-node
        # Limits and requests for the GridGain container.
        resources:
          limits:
            cpu: "4"
            memory: 4Gi
          requests:
            cpu: "4"
            memory: 4Gi
        env:
          # Must be specified to ensure that GridGain 9 cluster replicas are visible to each other.
          - name: GRIDGAIN_NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          # GridGain 9 working directory.
          - name: GRIDGAIN_WORK_DIR
            value: /gg9-work
        # GridGains Docker image and it's version.
        image: gridgain/gridgain9:9.1.16
        ports:
        - containerPort: 10300
        - containerPort: 10800
        - containerPort: 3344
        volumeMounts:
        # The config will be placed at this path in the container.
        - mountPath: /opt/gridgain/etc/gridgain-config.conf
          name: config-vol
          subPath: gridgain-config.conf
        # The license will be placed at this path in the container.
        - mountPath: /opt/gridgain/etc/license.conf
          name: license-vol
          subPath: license.conf
        # GridGain 9 working directory.
        - mountPath: /gg9-work
          name: persistence
      volumes:
      - name: config-vol
        configMap:
          name: gridgain-config
      - name: license-vol
        configMap:
          name: gridgain-license
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: persistence
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi # Provide enough space for your application data.
      volumeMode: Filesystem

Apply the statefulset.yaml file to deploy the main components of GridGain 9:
```
kubectl apply -f statefulset.yaml
```

Wait for Pods to Start

Monitor the status of the pods:
```
kubectl get pods -n <namespace> -w
```
Ensure that all pods' STATUS is Running before proceeding.

Deploy the Job

Prepare the job.yaml file for deploying the job:

job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: cluster-init
  # Place your namespace name here.
  namespace: <namespace>
spec:
  template:
    spec:
      containers:
      # Command to init the cluster. URL and host must be the name of the service you created before. Port is 10300 as the management port.
      - args:
        - -ec
        - |
          apt update && apt-get install -y bind9-host
          GG_NODES=$(host -tsrv _cluster._tcp.gridgain-svc-headless | grep 'SRV record' | awk '{print $8}' | awk -F. '{print $1}' | paste -sd ',')
          /opt/gridgain9cli/bin/gridgain9 cluster init --name=gridgain --url=http://gridgain-svc-headless:10300 --license=/opt/gridgain/etc/license.conf
        command:
        - /bin/sh
        # Specify the Docker image with the GridGain 9 CLI and its version.
        image: gridgain/gridgain9:9.1.16
        imagePullPolicy: IfNotPresent
        name: cluster-init
        resources: {}
        volumeMounts:
        # The license required to be mounted to cluster-init job.
        - mountPath: /opt/gridgain/etc/license.conf
          name: license-vol
          subPath: license.conf
      restartPolicy: Never
      terminationGracePeriodSeconds: 120
      volumes:
      - name: license-vol
        configMap:
          name: gridgain-license

Apply the job.yaml file to complete installation.
```
kubectl apply -f job.yaml
```

Installation Verification

Check the status of all resources in your namespace:
```
kubectl get all -n <namespace>
```
Ensure that all components are running as expected, without errors, and that the initialization job is in the Completed status.
Verify that your cluster is initialized and running.
```
kubectl exec -it gridgain-cluster-0 bash -n <namespace>
/opt/gridgain9cli/bin/gridgain9 cluster status
```
The command output must include the name of your cluster and the number of nodes. The status must be ACTIVE.

Optional: KEDA Configuration

You can configure KEDA to automatically scale the cluster based on your needs, ensuring optimal resource optimization and performance. This implementation uses Prometheus to monitor cluster load.

To enable KEDA scaling for your cluster:

Add the necessary Helm repositories, install KEDA and Prometheus:

helm install keda kedacore/keda --namespace keda --create-namespace
helm install prometheus prometheus-community/prometheus --namespace keda -f prometheus-values.yaml

Deploy the KEDA configurations:

The keda-scaled-object.yaml configuration defines the scaling rules for the GridGain cluster:

keda-scaled-object.yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: gridgain-autoscale
  namespace: gridgain
spec:
  scaleTargetRef:
    kind: StatefulSet
    name: gridgain-cluster

  pollingInterval: 30
  cooldownPeriod: 120

  minReplicaCount: 2  # Set initial number of replics
  maxReplicaCount: 5

  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          selectPolicy: Disabled
        scaleUp:
          stabilizationWindowSeconds: 60  # Increase if needed
          selectPolicy: Max
          policies:
            - type: Pods
              value: 1
              periodSeconds: 120

  triggers:
    # Uncomment and modify the following section to enable CPU usage based autoscaling
    # - type: prometheus
    #   metadata:
    #     name: cpu-usage
    #     serverAddress: http://prometheus-server.keda.svc.cluster.local:80
    #     query: sum(os_system_load_average{job="gridgain"})
    #     threshold: "0.8"
    #     activationThreshold: "0.6"
    - type: prometheus
      name: heap-memory-usage
      metadata:
        serverAddress: http://prometheus-server.keda.svc.cluster.local:80
        query: |
          sum(jvm_memory_committed_bytes{area="heap", job="gridgain"} / jvm_memory_max_bytes{area="heap", job="gridgain"})
        threshold: "0.7"
    - type: prometheus
      name: nonheap-memory-usage
      metadata:
        serverAddress: http://prometheus-server.keda.svc.cluster.local:80
        query: |
          sum(jvm_memory_committed_bytes{area="nonheap", job="gridgain"} / jvm_memory_max_bytes{area="nonheap", job="gridgain"})
        threshold: "0.7"

The keda-recovery-scaled-job.yaml configuration handles rebuilding CMG nodes in the GridGain cluster.

keda-scaled-object.yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: gridgain-recovery
  namespace: gridgain
spec:
  jobTargetRef:
    template:
      spec:
        # securityContext:
        #   runAsUser: 0
        #   runAsGroup: 0
        #   fsGroup: 0
        containers:
          - name: recovery
            image: gridgain/gridgain9:9.1.1
            command: ["/bin/bash", "/scripts/recovery.sh"]
            volumeMounts:
            - name: script-vol
              mountPath: /scripts
        restartPolicy: Never
        volumes:
        - name: script-vol
          configMap:
            name: gridgain-recovery-script
            defaultMode: 0777
    backoffLimit: 1
  pollingInterval: 30
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 3
  maxReplicaCount: 1
  triggers:
    # - type: prometheus
    #   metadata:
    #     serverAddress: http://prometheus-server.keda.svc.cluster.local:80
    #     query: avg(os_system_load_average{job="gridgain"})
    #     threshold: "0.8"
    #     activationThreshold: "0.6"
    - type: prometheus
      name: heap-memory-usage
      metadata:
        serverAddress: http://prometheus-server.keda.svc.cluster.local:80
        query: |
          avg(jvm_memory_committed_bytes{area="heap", job="gridgain"} / jvm_memory_max_bytes{area="heap", job="gridgain"})
        threshold: "0.7"
    - type: prometheus
      name: nonheap-memory-usage
      metadata:
        serverAddress: http://prometheus-server.keda.svc.cluster.local:80
        query: |
          avg(jvm_memory_committed_bytes{area="nonheap", job="gridgain"} / jvm_memory_max_bytes{area="nonheap", job="gridgain"})
        threshold: "0.7"

You can deploy the above configurations with the following commands:

kubectl apply -n gridgain -f keda-scaled-object.yaml
kubectl apply -n gridgain -f keda-recovery-scaled-job.yaml

Installation Troubleshooting

If any issues occur during the installation:

Check the logs of specific pods:
```
kubectl logs <pod-name> -n <namespace>
```
Review events in the namespace:
```
kubectl get events -n <namespace>
```

Troubleshooting via REST

This approach is intended for short-lived debugging and testing sessions, not for long-term or external access.

Forward the REST port to a specific GridGain 9 replica:

kubectl port-forward <pod-name> 10300:10300 -n <namespace>

Example:

kubectl port-forward gridgain-cluster-0 10300:10300 -n gridgain

Open a new terminal and run REST requests to the http://localhost:10300, for example:

curl 'http://localhost:10300/management/v1/cluster/state'

{"title":"Cluster is not initialized","status":409,"detail":"Cluster is not initialized. Call /management/v1/cluster/init in order to initialize cluster."}

curl 'http://localhost:10300/management/v1/node/info'
{"name":"gridgain-cluster-0","jdbcPort":10800}

For more information refer to the documentation:

Kubernetes: Use Port Forwarding to Access Applications in a Cluster
kubectl command reference: kubectl port-forward
GridGain 9 REST API docs: REST API

Limitations and Considerations

When running GridGain 9 in a Kubernetes environment, the node configuration becomes read-only and cannot be modified by using the gridgain9 node config update CLI command. This is by design, as node configuration is managed via Kubernetes resources. To change your configuration:

Manually update the corresponding ConfigMap;
Restart all cluster pods by executing kubectl delete pod for each replica.

The updated configuration will take effect after the pods are recreated by the Kubernetes controller.

© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.

Last updated on Dec 23, 2025