GridGain Developers Hub

Getting Started with GridGain 9 Persistent Storage

Introduction

About this Guide

This guide will walk you through the basics of setting up and using GridGain 9’s RocksDB-based persistent storage with the Chinook database in a Docker-based environment.

Prerequisites

  • Up-to-date versions of Docker and Docker Compose;

  • Terminal or command line access;

  • Basic SQL knowledge;

  • At least 8GB of free RAM for the GridGain cluster.

Understanding Persistence in GridGain

Persistence Architecture

GridGain persistence is designed to provide quick and responsive persistent storage. When using persistent storage:

  • GridGain stores all data in it on disk;

  • It loads as much data as possible into RAM for processing;

  • Data is split into multiple partitions, with each partition stored in a separate file on disk;

  • In addition to data partitions, GridGain stores indexes and metadata on disk.

This architecture combines the performance benefits of in-memory computing with the durability of disk-based storage.

Storage Engine Types

Persistent Storage Options

  • AIPersist Engine - Default persistent storage engine with checkpointing;

  • RocksDB Engine - LSM-tree based persistent storage optimized for write-heavy workloads.

Volatile Storage Options

  • AIMem Engine - In-memory storage with no persistence.

Storage Profiles

In GridGain 9, persistence is configured by using storage profiles. A storage profile defines how data is stored, cached, and managed by the storage engine.

Each storage profile has specific properties depending on the engine type, but all profiles must specify the following properties:

  • name - A unique identifier for the profile

  • engine - The storage engine to use

Distribution Zones

Distribution zones control how data is distributed across the cluster and which storage profiles to use. They allow you to:

  • Control the number of data replicas;

  • Specify which nodes can store data;

  • Define how data is partitioned;

  • Assign storage profiles to determine persistence type.

Setting Up a Persistent Cluster

Docker Environment Configuration

We will use Docker Compose to create a multi-node GridGain cluster with persistent storage.

Creating the Docker Compose File

Create a docker-compose.yml file in your working directory:

name: gridgain9

x-gridgain-def: &gridgain-def
  image: gridgain/gridgain:latest
  environment:
    JVM_MAX_MEM: "4g"
    JVM_MIN_MEM: "4g"
  configs:
    - source: node_config
      target: /opt/gridgain/etc/ignite-config.conf

services:
  node1:
    <<: *gridgain-def
    command: --node-name node1
    ports:
      - "10300:10300"
      - "10800:10800"
    volumes:
      - ./data/node1:/opt/gridgain/work

  node2:
    <<: *gridgain-def
    command: --node-name node2
    ports:
      - "10301:10300"
      - "10801:10800"
    volumes:
      - ./data/node2:/opt/gridgain/work

  node3:
    <<: *gridgain-def
    command: --node-name node3
    ports:
      - "10302:10300"
      - "10802:10800"
    volumes:
      - ./data/node3:/opt/gridgain/work

configs:
  node_config:
    content: |
      ignite {
        network {
          port: 3344
          nodeFinder.netClusterNodes = ["node1:3344", "node2:3344", "node3:3344"]
        }
        "storage": {
          "profiles": [
            {
              name: "rocksDbProfile"
              engine: "rocksdb"
            }
          ]
        }
      }

The node_config configuration in the Docker Compose file:

  • Adds a storage profile named rocksDbProfile that uses the RocksDB engine;

  • Sets the storage size to 256MB (268435456 bytes) by default;

  • Stores persistent data in the data directory where docker was run.

Starting the Cluster

Run the following command to start your cluster:

docker-compose up -d

Verifying Cluster Deployment

Check that all nodes are running:

docker compose ps

You should see output similar to:

NAME              IMAGE                       COMMAND                  SERVICE   CREATED         STATUS         PORTS
gridgain9-node1-1   gridgain/gridgain9:latest   "docker-entrypoint.s…"   node1     44 seconds ago   Up 6 seconds   0.0.0.0:10300->10300/tcp, 3344/tcp, 0.0.0.0:10800->10800/tcp
gridgain9-node2-1   gridgain/gridgain9:latest   "docker-entrypoint.s…"   node2     44 seconds ago   Up 6 seconds   3344/tcp, 0.0.0.0:10301->10300/tcp, 0.0.0.0:10801->10800/tcp
gridgain9-node3-1   gridgain/gridgain9:latest   "docker-entrypoint.s…"   node3     44 seconds ago   Up 6 seconds   3344/tcp, 0.0.0.0:10302->10300/tcp, 0.0.0.0:10802->10800/tcp

Verify the Docker network:

docker network ls

Configuring Persistent Storage

Connecting to the Cluster

Connect to the GridGain CLI:

docker run --rm -it --network=host -v /opt/etc/license.json:/opt/gridgain/etc/license.json -e LANG=C.UTF-8 -e LC_ALL=C.UTF-8 gridgain/gridgain9:latest cli

When the CLI tool offers to connect to default node, confirm the connection. If you ever get disconnected, you can connect again by typing the following command:

connect http://localhost:10300

Initializing the Cluster

Before using the cluster, initialize it:

cluster init --name=GridGain --license=/opt/gridgain/etc/license.json

You should see the message "Cluster was initialized successfully".

Examining Storage Profiles

Verify the configured storage profiles:

node config show ignite.storage

You should see output showing the rocksDbProfile configuration along with the default profiles.

Creating Distribution Zones for Persistence

Enter the interactive SQL CLI:

sql

Create a distribution zone that uses our RocksDB storage profile:

CREATE ZONE ChinookRocksDB WITH replicas=2, storage_profiles='rocksDbProfile';

Building the Chinook Database with Persistence

About the Chinook Database

The Chinook database represents a digital media store with tables for artists, albums, tracks, and more. It’s commonly used as a sample database for demonstrating database features.

Creating Persistent Database Tables

Create the necessary tables for the Chinook database using our RocksDB persistent zone:

-- Create Artist table
CREATE TABLE Artist (
    ArtistId INT NOT NULL,
    Name VARCHAR(120),
    PRIMARY KEY (ArtistId)
) ZONE ChinookRocksDB;

-- Create Album table
CREATE TABLE Album (
    AlbumId INT NOT NULL,
    Title VARCHAR(160) NOT NULL,
    ArtistId INT NOT NULL,
    PRIMARY KEY (AlbumId, ArtistId)
) COLOCATE BY (ArtistId) ZONE ChinookRocksDB;

-- Create Genre table
CREATE TABLE Genre (
    GenreId INT NOT NULL,
    Name VARCHAR(120),
    PRIMARY KEY (GenreId)
) ZONE ChinookRocksDB;

-- Create MediaType table
CREATE TABLE MediaType (
    MediaTypeId INT NOT NULL,
    Name VARCHAR(120),
    PRIMARY KEY (MediaTypeId)
) ZONE ChinookRocksDB;

-- Create Track table
CREATE TABLE Track (
    TrackId INT NOT NULL,
    Name VARCHAR(200) NOT NULL,
    AlbumId INT,
    MediaTypeId INT NOT NULL,
    GenreId INT,
    Composer VARCHAR(220),
    Milliseconds INT NOT NULL,
    Bytes INT,
    UnitPrice NUMERIC(10,2) NOT NULL,
    PRIMARY KEY (TrackId, AlbumId)
) COLOCATE BY (AlbumId) ZONE ChinookRocksDB;

Loading Sample Data

Insert sample data into the tables:

-- Insert data into MediaType table
INSERT INTO MediaType (MediaTypeId, Name) VALUES
(1, 'MPEG audio file'),
(2, 'Protected AAC audio file');

-- Insert data into Artist table
INSERT INTO Artist (ArtistId, Name) VALUES
(1, 'AC/DC'),
(2, 'Accept'),
(3, 'Aerosmith'),
(4, 'Alanis Morissette'),
(5, 'Alice In Chains');

-- Insert data into Album table
INSERT INTO Album (AlbumId, Title, ArtistId) VALUES
(1, 'For Those About To Rock We Salute You', 1),
(2, 'Balls to the Wall', 2),
(3, 'Restless and Wild', 2),
(4, 'Let There Be Rock', 1),
(5, 'Big Ones', 3);

-- Insert data into Genre table
INSERT INTO Genre (GenreId, Name) VALUES
(1, 'Rock'),
(2, 'Jazz'),
(3, 'Metal'),
(4, 'Alternative & Punk'),
(5, 'Rock And Roll');

-- Insert data into Track table
INSERT INTO Track (TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice) VALUES
(1, 'For Those About To Rock (We Salute You)', 1, 1, 1, 'Angus Young, Malcolm Young, Brian Johnson', 343719, 11170334, 0.99),
(2, 'Balls to the Wall', 2, 2, 1, 'U. Dirkschneider, W. Hoffmann, H. Frank, P. Baltes, S. Kaufmann, G. Hoffmann', 342562, 5510424, 0.99),
(3, 'Fast As a Shark', 3, 2, 1, 'F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman', 230619, 3990994, 0.99),
(4, 'Restless and Wild', 3, 2, 1, 'F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman', 252051, 4331779, 0.99),
(5, 'Princess of the Dawn', 3, 2, 1, 'Deaffy & R.A. Smith-Diesel', 375418, 6290521, 0.99);

Querying the Database

Test that your data was inserted correctly:

SELECT a.Name AS Artist, al.Title AS Album, t.Name AS Track
FROM Track t
JOIN Album al ON t.AlbumId = al.AlbumId
JOIN Artist a ON al.ArtistId = a.ArtistId
WHERE t.AlbumId = 1;

Testing Persistence Capabilities

Verifying Data Before Restart

Perform additional queries to ensure your data is properly stored:

-- Count tracks by genre
SELECT g.Name AS Genre, COUNT(t.TrackId) AS TrackCount
FROM Track t
JOIN Genre g ON t.GenreId = g.GenreId
GROUP BY g.Name;

-- Check all albums by artist
SELECT a.Name AS Artist, COUNT(al.AlbumId) AS AlbumCount
FROM Album al
JOIN Artist a ON al.ArtistId = a.ArtistId
GROUP BY a.Name;

Restarting the Cluster

To restart the cluster, you need to first exit the CLI tool.

  • Exit the SQL CLI with the exit; command,

  • Then exit the main CLI with the exit command.

Restart the Docker containers:

docker-compose down
docker-compose up -d

Verifying Data Persistence After Restart

Reconnect to the CLI:

docker run -it --rm --net host gridgain/gridgain9 cli

The cluster is already initialized, so you can go directly to the SQL CLI:

sql

Run the same query to verify the data persisted through the restart:

SELECT a.Name AS Artist, al.Title AS Album, t.Name AS Track
FROM Track t
JOIN Album al ON t.AlbumId = al.AlbumId
JOIN Artist a ON al.ArtistId = a.ArtistId
WHERE t.AlbumId = 1;

Wrap Up

Summary

GridGain 9 with RocksDB persistent storage provides a powerful way to maintain data durability while leveraging in-memory computing performance. RocksDB is particularly well-suited for write-intensive workloads, making it an excellent choice for many production environments.