GridGain Developers Hub

Installing on Kubernetes

You can install GridGain 9 and run a GridGain cluster on Kubernetes cluster. This section describes all the necessary steps, as well as provides the configurations and manifests that you can copy and paste into your environment.

Prerequisites

GridGain is tested on Kubernetes version 1.26.

Installation Steps

Create ConfigMaps

  1. Create the GridGain configuration file and get a license. The minimum node configuration is as follows:

    gridgain-config.conf
    ignite: {
      network: {
        # GridGain 9 node port
        port = 3344
        nodeFinder = {
          netClusterNodes = [
            # Kubernetes service to access the GridGain 9 cluster on the Kubernetes network
            "gridgain-svc-headless:3344"
          ]
        }
      }
    
      storage: {
        profiles = [
          {
            engine = "aipersist"
            name = "default"
            replacementMode = "CLOCK"
            # Explicit storage size configuration
            sizeBytes = 2147483648
          }
        ]
      }
    }
  2. Place your license content in the license.conf file.

  3. Create the ConfigMap object for GridGain configuration:

    kubectl create configmap gridgain-config -n <namespace> --from-file=gridgain-config.conf
  4. Create the ConfigMap object for the GridGain license:

    kubectl create configmap gridgain-license -n <namespace> --from-file=license.conf

    Replace <namespace> with the name of the namespace where you want to deploy GridGain.

    To update GridGain node configuration, modify the existing ConfigMap and restart all GridGain pods.

    • Modify previously configured ConfigMap object:

      kubectl edit configmap gridgain-config -n <namespace>
    • Restart GridGain pod, repeat for every pod:

      kubectl delete pod <GridGain pode name> -n <namespace>

Create and Deploy the Service

Depending on your requirements, define and deploy a Kubernetes service. Gridgain 9 use two types of services: one for internal cluster discovery, and the other — for external client access.

  1. First, choose a type of service you need and prepare the service.yaml file.

    • For communication inside the Kubernetes cluster, Use a headless service by setting the clusterIP parameter to None. This will expose each pod’s IP, enabling GridGain to be partition‑aware: clients discover every node’s address, determine which partition resides on which node, and send requests directly where the data is located.

    service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      # The name must be equal to netClusterNodes.
      name: gridgain-svc-headless
      # Place your namespace name here.
      namespace: <namespace>
    spec:
      clusterIP: None
      internalTrafficPolicy: Cluster
      ipFamilies:
      - IPv4
      ipFamilyPolicy: SingleStack
      ports:
      - name: management
        port: 10300
        protocol: TCP
        targetPort: 10300
      - name: rest
        port: 10800
        protocol: TCP
        targetPort: 10800
      - name: cluster
        port: 3344
        protocol: TCP
        targetPort: 3344
      selector:
        # Must be equal to the label set for pods.
        app: gridgain
      # Include not-yet-ready nodes.
      publishNotReadyAddresses: True
      sessionAffinity: None
      type: ClusterIP
    • Use a LoadBalancer service to allow external clients to connect. Keep in mind, that with this option you giving up partition awareness.

      If your environments does not support LoadBalancer, you can use type: NodePort instead. Refer to the Kubernetes documentation for details.

      apiVersion: v1
      kind: Service
      metadata:
        name: gridgain-loadbalancer
        labels:
          app: gridgain
      spec:
        type: LoadBalancer
        selector:
          app: gridgain
        ports:
          - name: rest
            protocol: TCP
            port: 10800
            targetPort: 10800
          - name: client
            port: 10300
            protocol: TCP
            targetPort: 10300
  2. Then apply the service.yaml file to set up this service:

kubectl apply -f service.yaml

Deploy the StatefulSet

  1. Prepare the statefulset.yaml file for StatefulSet deployment:

    statefulset.yaml
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      # The cluster name.
      name: gridgain-cluster
      # Place your namespace name.
      namespace: <namespace>
    spec:
      # The initial number of pods to be started by Kubernetes.
      replicas: 2
      # Kubernetes service to access the GridGain 9 cluster on the Kubernetes network.
      serviceName: gridgain-svc-headless
      selector:
        matchLabels:
          app: gridgain
      template:
        metadata:
          labels:
            app: gridgain
        spec:
          terminationGracePeriodSeconds: 60000
          containers:
            # Custom pod name.
          - name: gridgain-node
            # Limits and requests for the GridGain container.
            resources:
              limits:
                cpu: "4"
                memory: 4Gi
              requests:
                cpu: "4"
                memory: 4Gi
            env:
              # Must be specified to ensure that GridGain 9 cluster replicas are visible to each other.
              - name: GRIDGAIN_NODE_NAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              # GridGain 9 working directory.
              - name: GRIDGAIN_WORK_DIR
                value: /gg9-work
            # GridGains Docker image and it's version.
            image: gridgain/gridgain9:9.1.8
            ports:
            - containerPort: 10300
            - containerPort: 10800
            - containerPort: 3344
            volumeMounts:
            # The config will be placed at this path in the container.
            - mountPath: /opt/gridgain/etc/gridgain-config.conf
              name: config-vol
              subPath: gridgain-config.conf
            # The license will be placed at this path in the container.
            - mountPath: /opt/gridgain/etc/license.conf
              name: license-vol
              subPath: license.conf
            # GridGain 9 working directory.
            - mountPath: /gg9-work
              name: persistence
          volumes:
          - name: config-vol
            configMap:
              name: gridgain-config
          - name: license-vol
            configMap:
              name: gridgain-license
      volumeClaimTemplates:
      - apiVersion: v1
        kind: PersistentVolumeClaim
        metadata:
          name: persistence
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi # Provide enough space for your application data.
          volumeMode: Filesystem
  2. Apply the statefulset.yaml file to deploy the main components of GridGain 9:

    kubectl apply -f statefulset.yaml

Wait for Pods to Start

  1. Monitor the status of the pods:

    kubectl get pods -n <namespace> -w
  2. Ensure that all pods' STATUS is Running before proceeding.

Deploy the Job

  1. Prepare the job.yaml file for deploying the job:

    job.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: cluster-init
      # Place your namespace name here.
      namespace: <namespace>
    spec:
      template:
        spec:
          containers:
          # Command to init the cluster. URL and host must be the name of the service you created before. Port is 10300 as the management port.
          - args:
            - -ec
            - |
              apt update && apt-get install -y bind9-host
              GG_NODES=$(host -tsrv _cluster._tcp.gridgain-svc-headless | grep 'SRV record' | awk '{print $8}' | awk -F. '{print $1}' | paste -sd ',')
              /opt/gridgain9cli/bin/gridgain9 cluster init --name=gridgain --url=http://gridgain-svc-headless:10300 --license=/opt/gridgain/etc/license.conf
            command:
            - /bin/sh
            # Specify the Docker image with the GridGain 9 CLI and its version.
            image: gridgain/gridgain9:9.1.8
            imagePullPolicy: IfNotPresent
            name: cluster-init
            resources: {}
            volumeMounts:
            # The license required to be mounted to cluster-init job.
            - mountPath: /opt/gridgain/etc/license.conf
              name: license-vol
              subPath: license.conf
          restartPolicy: Never
          terminationGracePeriodSeconds: 120
          volumes:
          - name: license-vol
            configMap:
              name: gridgain-license
  2. Apply the job.yaml file to complete installation.

    kubectl apply -f job.yaml

Installation Verification

  1. Check the status of all resources in your namespace:

    kubectl get all -n <namespace>
  2. Ensure that all components are running as expected, without errors, and that the initialization job is in the Completed status.

  3. Verify that your cluster is initialized and running.

    kubectl exec -it gridgain-cluster-0 bash -n <namespace>
    /opt/gridgain9cli/bin/gridgain9 cluster status

    The command output must include the name of your cluster and the number of nodes. The status must be ACTIVE.

Optional: KEDA Configuration

You can configure KEDA to automatically scale the cluster based on your needs, ensuring optimal resource optimization and performance. This implementation uses Prometheus to monitor cluster load.

To enable KEDA scaling for your cluster:

  • Add the necessary Helm repositories, install KEDA and Prometheus:

    helm install keda kedacore/keda --namespace keda --create-namespace
    helm install prometheus prometheus-community/prometheus --namespace keda -f prometheus-values.yaml
  • Deploy the KEDA configurations:

    • The keda-scaled-object.yaml configuration defines the scaling rules for the GridGain cluster:

      keda-scaled-object.yaml
      apiVersion: keda.sh/v1alpha1
      kind: ScaledObject
      metadata:
        name: gridgain-autoscale
        namespace: gridgain
      spec:
        scaleTargetRef:
          kind: StatefulSet
          name: gridgain-cluster
      
        pollingInterval: 30
        cooldownPeriod: 120
      
        minReplicaCount: 2  # Set initial number of replics
        maxReplicaCount: 5
      
        advanced:
          horizontalPodAutoscalerConfig:
            behavior:
              scaleDown:
                selectPolicy: Disabled
              scaleUp:
                stabilizationWindowSeconds: 60  # Increase if needed
                selectPolicy: Max
                policies:
                  - type: Pods
                    value: 1
                    periodSeconds: 120
      
        triggers:
          # Uncomment and modify the following section to enable CPU usage based autoscaling
          # - type: prometheus
          #   metadata:
          #     name: cpu-usage
          #     serverAddress: http://prometheus-server.keda.svc.cluster.local:80
          #     query: sum(os_system_load_average{job="gridgain"})
          #     threshold: "0.8"
          #     activationThreshold: "0.6"
          - type: prometheus
            name: heap-memory-usage
            metadata:
              serverAddress: http://prometheus-server.keda.svc.cluster.local:80
              query: |
                sum(jvm_memory_committed_bytes{area="heap", job="gridgain"} / jvm_memory_max_bytes{area="heap", job="gridgain"})
              threshold: "0.7"
          - type: prometheus
            name: nonheap-memory-usage
            metadata:
              serverAddress: http://prometheus-server.keda.svc.cluster.local:80
              query: |
                sum(jvm_memory_committed_bytes{area="nonheap", job="gridgain"} / jvm_memory_max_bytes{area="nonheap", job="gridgain"})
              threshold: "0.7"
    • The keda-recovery-scaled-job.yaml configuration handles rebuilding CMG nodes in the GridGain cluster.

      keda-scaled-object.yaml
      apiVersion: keda.sh/v1alpha1
      kind: ScaledJob
      metadata:
        name: gridgain-recovery
        namespace: gridgain
      spec:
        jobTargetRef:
          template:
            spec:
              # securityContext:
              #   runAsUser: 0
              #   runAsGroup: 0
              #   fsGroup: 0
              containers:
                - name: recovery
                  image: gridgain/gridgain9:9.1.1
                  command: ["/bin/bash", "/scripts/recovery.sh"]
                  volumeMounts:
                  - name: script-vol
                    mountPath: /scripts
              restartPolicy: Never
              volumes:
              - name: script-vol
                configMap:
                  name: gridgain-recovery-script
                  defaultMode: 0777
          backoffLimit: 1
        pollingInterval: 30
        successfulJobsHistoryLimit: 2
        failedJobsHistoryLimit: 3
        maxReplicaCount: 1
        triggers:
          # - type: prometheus
          #   metadata:
          #     serverAddress: http://prometheus-server.keda.svc.cluster.local:80
          #     query: avg(os_system_load_average{job="gridgain"})
          #     threshold: "0.8"
          #     activationThreshold: "0.6"
          - type: prometheus
            name: heap-memory-usage
            metadata:
              serverAddress: http://prometheus-server.keda.svc.cluster.local:80
              query: |
                avg(jvm_memory_committed_bytes{area="heap", job="gridgain"} / jvm_memory_max_bytes{area="heap", job="gridgain"})
              threshold: "0.7"
          - type: prometheus
            name: nonheap-memory-usage
            metadata:
              serverAddress: http://prometheus-server.keda.svc.cluster.local:80
              query: |
                avg(jvm_memory_committed_bytes{area="nonheap", job="gridgain"} / jvm_memory_max_bytes{area="nonheap", job="gridgain"})
              threshold: "0.7"
  • You can deploy the above configurations with the following commands:

    kubectl apply -n gridgain -f keda-scaled-object.yaml
    kubectl apply -n gridgain -f keda-recovery-scaled-job.yaml

Installation Troubleshooting

If any issues occur during the installation:

  • Check the logs of specific pods:

    kubectl logs <pod-name> -n <namespace>
  • Review events in the namespace:

    kubectl get events -n <namespace>

Limitations and Considerations

When running GridGain 9 in a Kubernetes environment, the node configuration becomes read-only and cannot be modified by using the gridgain9 node config update CLI command. This is by design, as node configuration is managed via Kubernetes resources. To change your configuration:

  1. Manually update the corresponding ConfigMap;

  2. Restart all cluster pods by executing kubectl delete pod for each replica.

The updated configuration will take effect after the pods are recreated by the Kubernetes controller.