GridGain Developers Hub

Data Center Replication from GridGain 8

In certain environments, downtime associated with migrating data between GridGain versions cannot be afforded. For these scenarios, GridGain provides a one-way data migration tool based on GridGain 8 Data Center Replication. Data can be copied to your GridGain 9 cluster as if it was a GridGain 8 cluster running in a different data center.

After replication is complete, a brief pause is required to stop writes, flush the replication backlog, and switch traffic to GridGain 9. As such, this is a "near-zero downtime" migration, not a seamless one. A brief maintenance window is required during the final cutover to ensure data consistency.

Prerequisites

As Data Center Replication is only available in GridGain 8 Enterprise and Ultimate editions, you need to have the corresponding license to start replication process.

Before starting replication:

  • Create an empty table the data will be written to. By default, fields will be matched to columns with the same name. You will be able to manually configure what cache fields correspond to what table columns. Make sure that:

  • Create nullable drVersion column of the VARBINARY data type (as drversion VARBINARY) for each target GridGain 9 table. This column will be used to store information about data center replication process and can be safely deleted once migration is complete. You cannot start replication without this column.

  • Recreate indexes from GridGain 8 in the target GridGain 9 tables. Indexes are not transferred automatically during DCR migration and must be created manually to maintain query performance. See Recreating Indexes for detailed instructions.

Row replication algorithm

Inserts/updates/removes from the source cluster Gridgain 8 are versioned. When a row or a tombstone record in the target Gridgain 9 cluster has version V1 and the source Gridgain 8 cluster sends insert/update/remove with version V2 that is less than V1, then the DR connector ignores such action. Otherwise the DR connector applies such insert/update/removes as follows:

GridGain 8 action GridGain 9 state DR connector action

insert/update entry

key does not exist

insert row

insert/update entry

key exists, drVersion column is not NULL

update row, overwrites all columns

insert/update entry

key exists, drVersion column is NULL

no action

remove entry

key exists, drVersion column is not NULL

create tombstone record in the tombstones table

remove entry

key exists, drVersion column is NULL

no action

remove entry

key does not exist

create tombstone record in the tombstones table

Type mapping

There are differences in GridGain 9 and GridGain 8 support for data types. Make sure data is mapped to columns of equivalent types. The table below provides a reference for commonly used types:

GridGain 8 Type GridGain 9 Type Notes

java.lang.Boolean

BOOLEAN

java.lang.Byte

TINYINT

java.lang.Short

SMALLINT

java.lang.Integer

INT

java.lang.Long

BIGINT

java.lang.Float

REAL

java.lang.Double

DOUBLE

java.math.BigDecimal

DECIMAL(p,s)

Specify precision and scale

java.lang.String

VARCHAR

Specify length if needed

byte[]

VARBINARY

Specify length if needed

java.util.Date

TIMESTAMP WITH LOCAL TIME ZONE

Consider timezone settings

java.sql.Time

TIME

Consider timezone settings

java.sql.Date

DATE

Consider timezone settings

java.sql.Timestamp

TIMESTAMP

Consider timezone settings

java.util.UUID

UUID

Prepare GridGain 9 tables and indexes

Here’s an example of migrating a GridGain 8 cache schema to GridGain 9:

  • Original GridGain 8 cache:

    // GridGain 8 cache with composite key
    public class PersonKey {
        private Long id;
        private String region;
    }
    
    public class Person {
        private String name;
        private Integer age;
        private BigDecimal salary;
    }
  • Creating equivalent GridGain 9 table:

    CREATE TABLE IF NOT EXISTS PUBLIC.PERSON (
        ID BIGINT NOT NULL,
        REGION VARCHAR NOT NULL,
        NAME VARCHAR,
        AGE INT,
        SALARY DECIMAL(10,2),
        drVersion VARBINARY,  -- Required for replication
        PRIMARY KEY (ID, REGION)
    );

Recreating Indexes

Since indexes are not automatically transferred during DCR migration, you need to identify existing indexes in GridGain 8 and recreate them in GridGain 9.

To identify indexes in GridGain 8, query the system views:

-- List all indexes for a specific table
SELECT * FROM SYS.INDEXES
WHERE TABLE_NAME = 'PERSON';

Alternatively, you can use the control script to examine cache configuration:

./bin/control.sh --cache idle_verify --dump --cache-filter PERSON

After identifying the indexes, create them in GridGain 9 before starting replication:

-- Example: recreating indexes for the PERSON table
CREATE INDEX idx_person_age ON PERSON(AGE);
CREATE INDEX idx_person_name ON PERSON(NAME);
CREATE INDEX idx_person_salary_region ON PERSON(SALARY, REGION);

The indexes should be created after table creation but before starting the replication process for optimal performance during data transfer.

Configuring Replication

Configure Gridgain 8 DR sender and Gridgain 9 DR connector

Configuring replication to GridGain 9 cluster on GridGain 8 side is the same as configuring replication to GridGain 8 cluster. On GridGain 9 side, you need to start a connector tool that converts data from GridGain 8 and transfers it to GridGain 9.

Make sure to note the ID of the data center the source cluster is running in. This ID will be used in connector configuration to filter incoming connections.

The drReceiverConfiguration.datacenterId parameter in the GridGain 9 DR connector must match the datacenterId parameter of a data center that sends data. The datacenterId parameter in DrSenderConnectionConfiguration is ignored.

The example below shows where in GridGain 8 configuration you can find data center ID.

<bean class="org.gridgain.grid.configuration.GridGainConfiguration">
    <!--
     datacenter id that identifies the source Gridgain 8 cluster.
     This parameter must match the drReceiverConfiguration.datacenterId parameter.
     -->
    <property name="dataCenterId" value="<source-datacenterId>"/>
    <property name="drSenderConfiguration">
        <bean class="org.gridgain.grid.configuration.DrSenderConfiguration">
            <property name="connectionConfiguration">
                <bean class="org.gridgain.grid.dr.DrSenderConnectionConfiguration">
                    <!--
                    datacenterId that identifies the dr connector that writes data to the target Gridgain 9 cluster.
                    -->
                    <property name="dataCenterId" value="<dr-connector-datacenterId>"/>
                    <property name="receiverAddresses">
                        <list>
                            <value>[dr-connector-host]:[dr-connector-port]</value>
                        </list>
                    </property>
                    ...
                </bean>
            </property>
            ...
        </bean>
    </property>
    ...
</bean>

For additional information on data center replication in Gridgain 8 see Data Center Replication in Gridgain 8.

Tombstones

Tombstone time-to-live, configured by using the tombstoneTtl configuration property, on receiving side must be same or greater than on sending side. By default, the same value is used in GridGain 8 and GridGain 9 (30 minutes).

If a sender in the source Gridgain 8 cluster has larger tombstoneTtl than the DR connector, then the following warning is written to the DR connector logs:

Sending DataCenter has higher tombstone TTL than Receiving one (remoteTombstoneTTL > localTombstoneTTL) [remoteAddress=<sender-address>, dataCenterId=<dataCenterId>, remoteTombstoneTTL=<sender-TombstoneTTL>, localTombstoneTTL=<dr-connector-TombstoneTTL>]

Ensure that timeZone are in sync

For additional information on configuring and running the DR connector tool, see DR Connector.

Starting Replication

Start DR connector

Start the connector tool on GridGain 9 side:

./start.sh --config-path <path-to-dr-connector-config-file>

For additional information on configuring DR connector, see DR Connector.

Verify That DR Connector is Running

Check the DR connector logs to ensure it is listening on the configured port. Look for the "Listening port" line in the log, signifying that the DR connector is running and ready to receive data on the specified port. See Logging Configuration.

Once the dr connector connects to Gridagain 8 cluster, you can find the connection information in a Gridgain 8 sender node’s logs:

A sender hub has successfully connected to a remote receiver

Check Gridgain 8 sender node

Retrieve id of a sender node of the source Gridgain 8 cluster:

./bin/control.sh --dr topology

Here is how the returned information may look like:

--------------------------------------------------------------------------------
Data Center ID: 1
Topology: 1 server(s), 0 client(s)
Data nodes: <...>
--------------------------------------------------------------------------------
Sender hubs: 1
  nodeId=<sender-node-id>, Address=[<addresses>], Mode=Server
  ...
--------------------------------------------------------------------------------
Receiver hubs: <...>
--------------------------------------------------------------------------------
Other nodes: <...>
--------------------------------------------------------------------------------
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 183 ms

Verify that this sender node is configured to send data to the target Gridgain 9 cluster:

./bin/control.sh --dr node <sender-node-id> --config

Make sure that the DataCenterId matches the DR connector’s address and ID.

--------------------------------------------------------------------------------
Data Center ID: <source-datacenterId>
Node addresses: [<addresses>]
Mode=Server, Baseline node
--------------------------------------------------------------------------------
Node is configured to send data to:
  DataCenterId=<dr-connector-datacenterId>, Addresses=[<dr-connector-host>:<dr-connector-port>] <-- displays dr-connector's address and datacenterId
Common configuration:
  StreamerThreadPoolSize=10
  ThreadPoolSize=4
  IncrementalDrThreadPoolSize=4
Sender configuration:
  SenderGroups=[<sender-group>] <-- includes names of sender groups used by replicated caches.
  MaxErrors=10
  MaxFailedConnectAttempts=5
  MaxQueueSize=100
  SocketSendBufferSize=0
  SocketReceiveBufferSize=0
  HealthCheckFrequency=2000
  ReadTimeout=5000
  ReconnectOnFailureTimeout=5000
  SystemRequestTimeout=5000
  StoreSizeBytes=0
  UseIgniteSslContextFactory=true
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 151 ms

For additional information on how to configure senders see Data Center Replication, Configure Connection Between Clusters.

Verify that cache replication is enabled

Verify that each source cache has replication enabled by running this command for every source cache:

./bin/control.sh --dr cache <cache-name> --config

GridGain will return information about the caches being replicated:

--------------------------------------------------------------------------------
Data Center ID: <source-datacenterId>
--------------------------------------------------------------------------------
1 matching cache(s): [<cache-name>]
Sender configuration for cache "<cache-name>":
  PreferLocalSender=true
  BatchSendSize=0
  BatchSendFrequency=0
  EntryFilter=null
  LoadBalancingMode=DR_RANDOM
  StateTransferThreadsCount=2
  MaxBatches=32
  SenderGroup=<sender-group> <-- refers to the sender group specified in the sender node configuration.
  StateTransferThrottleBytes=10485760
  MaxBackupQueueSize=0
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 185 ms

For additional information on how to configure replication for a cache see Data Center Replication, Configure Caches.

Initiate full state transfer from GridGain 8

Start full state transfer from the source GridGain 8 cluster:

./bin/control.sh --dr full-state-transfer start

Alternatively, start full state transfer for each cache individually:

./bin/control.sh --dr <cache-name> --action full-state-transfer

This will transfer data from the source Gridgain 8 cluster to the target GridGain 9 cluster.

To list active full-state-transfer states use the list command.

./bin/control.sh --dr full-state-transfer list

The command will provide the list of caches being transferred, as well as execution time for each:

--------------------------------------------------------------------------------
Data Center ID: <source-datacenterId>
--------------------------------------------------------------------------------
Full state transfer states:
CacheDrStateTransfer [id=<cache-state-transfer-id>, cacheName=<cache-name>, nodeId=<sender-node-id>, startTime=1770878397721, partitionsTransferred=1023, syncMode=true]
null
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 4848 ms

Wait until the full state transfer completes, when than happens, list command returns an empty result:

--------------------------------------------------------------------------------
Data Center ID: <source-datacenterId>
--------------------------------------------------------------------------------
Full state transfer states:
null
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 3686 ms

Cancelling full state transfer

To cancel a running full state transfer use cancel command.

# <cache-state-transfer-id> obtained from output of `./bin/control.sh --dr full-state-transfer list` command
./bin/control.sh --dr full-state-transfer cancel <cache-state-transfer-id>

When the replication is cancelled, GridGain will provide a summary:

Full state transfer canceled, task id=<CacheDrStateTransfer.id>
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 5487 ms

To pause replication from the source Gridgain 8 cluster execute pause command.

# datacenterId specified in DrSenderConnectionConfiguration for the Gridgain 9 dr connector
./bin/control.sh --dr pause <dr-connector-datacenterId>

To resume replication from the source Gridgain 8 cluster execute resume command.

# datacenterId specified in DrSenderConnectionConfiguration for the Gridgain 9 dr connector
./bin/control.sh --dr resume <dr-connector-datacenterId>

Monitoring Replication

After full state transfer is started, monitor replication progress to determine when the GridGain 9 cluster is caught up and ready for cutover.

Checking DR Status on GridGain 8

Use the control script to view the current replication status:

/bin/control.sh --dr node <sender-node-id> --metrics

GridGain will return the status of the node, as well as detailed information about the replication:

Data Center ID: <source-datacenterId>
Node addresses: [<addresses>]
Mode=Server, Baseline node
--------------------------------------------------------------------------------
Node is configured to send data to:
  DataCenterId=<dr-connector-datacenterId>, Addresses=[<dr-connector-host>:<dr-connector-port>]
Sender metrics:
  StoreSize=0
  BatchesSent=1
  BatchesReceived=1
  BatchesAcked=1
  AverageBatchAckTime=18.0
  BytesSent=182
  BytesReceived=182
  BytesAcked=236
  EntriesSent=1
  EntriesReceived=1
  EntriesAcked=1
  StoreIsOverflow=false
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 146 ms

[discrete] === Replication JMX metrics === Check the replication backlog for the cache via JMX MBean attributes at a sender node of the source Gridgain 8 cluster:

org.apache.ignite.<cache-name>."Data Center Replication"
Attribute name Description

DrBatchWaitingAcknowledgeCount

The number batches have been successfully processed by the dr-connector

DrQueuedKeysCount

The number of entries waiting to be sent. When this value is 0, all updates to the cache have been replicated to the target table.

Examine JMX MBean attributes of the dr connector for the cache:

org.apache.ignite.metrics."Data Center Replication".receiver.<dr-connector-datacenterId>.<cache-name>
Attribute name Description

BatchesReceived

The number of batches have been received.

BatchesAcked

The number of batches have been processed successfully.

EntriesReceived

The number of entries have been received.

EntriesAcked

The number entries have been processed successfully.

MessageQueueLength

The number of batches are being processed. When this value is 0, all updates to this cache have been replicated to the target table.

Check JMX MBean attributes of the dr connector for all the caches to check overall progress:

org.apache.ignite.metrics."Data Center Replication".receiver
Attribute name Description

TotalBatchesAcked

The total number of batches have been processed successfully for all caches combined.

TotalEntriesReceived

The total number of entries have been received across for all caches combined.

TotalEntriesAcked

The total number of entries have been processed successfully for all caches combined.

TotalMessageQueueLength

The total number of batches are being processed for all caches combined. When this value is 0, all updates from all caches have been replicated to the target cluster.

Monitoring DR Connector logs

Errors or warnings in the connector log typically indicate schema mismatches or connectivity issues — see Troubleshooting for common problems and solutions. By default dr connector writes logs to <dr-connector-directory>/logs/dr-receiver-0.log. See Logging Configuration.

Confirming Incremental Sync Is Caught Up

Once the replication backlog on the GridGain 8 side reaches zero:

  1. The full state transfer is complete.

  2. The connector is now applying incremental updates in near-real time.

  3. Any new writes to GridGain 8 will be replicated with minimal delay.

At this point the cluster is ready for cutover. If you continue to write to GridGain 8 during this phase, the backlog will fluctuate but should remain small.

Cutover

The cutover procedure requires a brief maintenance to ensure no data is generated during the switch.

Pause Writes on GridGain 8

Stop all applications that write to the GridGain 8 cluster. This ensures no new data is generated while you drain the replication backlog.

Wait for the Backlog to Drain

Monitor the replication backlog for each source cache until it reaches zero:

  • Gridgain 8 sender node JMX MBean attribute org.apache.ignite..<cache-name>."Data Center Replication".DrQueuedKeysCount

  • DR connector JMX MBean attribute org.apache.ignite.metrics."Data Center Replication".receiver.TotalMessageQueueLength

Wait until the backlog size (DrQueuedKeysCount) for all source caches being replicated, the processing queue length for each cache (MessageQueueLength), and the processing queue length (TotalMessageQueueLength) for the DR connector are 0. This confirms that all data written before the pause has been transferred to GridGain 9.

Verify Row Counts

Compare row counts between GridGain 8 and GridGain 9 to confirm data consistency:

# Source Gridgain 8 cluster
./control.sh --cache list person
--------------------------------------------------------------------------------
[cacheName=person, ..., cacheSize=<cache-size>, ... ]
Command [CACHE] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 173 ms

Check the number of records in the target table by executing the following SQL query:

# Gridgain 9 CLI
./bin/gridgain9 sql --jdbc-url=jdbc:ignite:thin://<gridgain9-host>:<gridgain9-client-port> "SELECT COUNT(*) FROM <target-schema>.<target-table>"

Repeat for each replicated table. The counts must match. Compare results against known values or against the GridGain 8 cluster (which should still be running in read-only mode at this point).

Repoint Client Applications to GridGain 9

Update your application configuration to connect to the GridGain 9 cluster and restart your applications. The specific steps depend on your deployment, but typically involve changing the connection string or endpoint addresses.

Confirm GridGain 9 Is Serving Traffic

Verify that client applications are successfully connected to GridGain 9 and processing requests. Check application logs and run test queries to confirm normal operation.

Post-Cutover

Stop Replication

Stop the replication process to the target Gridgain 9 cluster on the GridGain 8 source cluster.

./bin/control.sh --dr pause <dr-connector-datacenterId>
Warning: this command will pause data center replication for all caches.
Press 'y' to continue . . . y

--------------------------------------------------------------------------------
Data Center ID: <source-datacenterId>
--------------------------------------------------------------------------------
Command [DR] finished with code: 0
Control utility has completed execution at: 2026-02-01T01:00:00.279
Execution time: 1646 ms

Then stop the DR connector (see Stopping the Connector).

stop.sh

Clean up replication artifacts.

Drop the drVersion Column

The drVersion bookkeeping column is no longer needed after migration is complete. Remove it from each replicated table:

ALTER TABLE <target-schema>.<target-table> DROP COLUMN drVersion;

Drop the tombstones table

Each time DR connector starts, it automatically creates a table named _DR_TOMBSTONES the PUBLIC schema to store cache entries removed from the source Gridgain 8 cluster. When replication is complete, this table is no longer needed and can be dropped:

DROP TABLE PUBLIC._DR_TOMBSTONES;

Rollback

If validation fails before client applications have been repointed to GridGain 9, you can roll back to GridGain 8.

Rollback Procedure

  1. Repoint client applications back to the GridGain 8 cluster.

  2. Stop the DR connector.

  3. Remove data from the target tables in the GridGain 9 cluster.

  4. Drop the tombstones table in the Gridgain 9 cluster.

  5. Investigate and fix the issue that caused validation to fail (for example, schema mismatches or missing data).

  6. Restart the migration from the beginning — a new full state transfer is required.

Troubleshooting

Problem: Table Does Not Exist (Explicit Mapping)

org.apache.ignite.lang.TableNotFoundException: IGN-TBL-2 The table does not exist [name=<target-schema>.<target-table>] TraceId:49932d2f

The cache mapping for the source cache refers to a target table that does not exist.

Actions:

  • Verify the cache mapping for the source cache and update the tableName property to refer to the correct table.

  • If the cache mapping is correct, create the target table under the specified qualified name (see the type mapping table).

Problem: Table Does Not Exist (Default Mapping)

org.apache.ignite.lang.TableNotFoundException: IGN-TBL-2 The table does not exist [name=PUBLIC.<cache-name>] TraceId:49932d2f

When no cache mapping is configured for a source cache, the DR connector create a default mapping that expects a target table with the same name as the cache in the PUBLIC schema.

Action:

  • Verify that a cache mapping for the source cache exists. If no mapping exists, create one and restart the replication.

Problem: Missing Bookkeeping Column

org.apache.ignite.lang.MarshallerException: IGN-MARSHALLING-1 Failed to serialize tuple for table <target-schema>.<target-table>: Value tuple doesn't match schema: schemaVersion=1, extraColumns=[DRVERSION] TraceId:67bcc4fb

The target table is missing the required drVersion bookkeeping column.

Action:

Add the bookkeeping column to the target table:

ALTER TABLE <target-schema>.<target-table> ADD COLUMN drVersion VARBINARY;

Then restart the replication.

Problem: Value Too Long

# string value in the source cache is too long
org.apache.ignite.lang.MarshallerException: IGN-MARSHALLING-1 Failed to serialize row for table <target-schema>.<target-table>. Value too long [column='<column>', type=STRING(<length>)] TraceId:b5c4924b

# string value in the source cache is too long
org.apache.ignite.lang.MarshallerException: IGN-MARSHALLING-1 Failed to serialize row for table <target-schema>.<target-table>. Value too long [column='<column>', type=BYTE_ARRAY(<length>)] TraceId:b5c4924b

A source cache entry field contains a value that exceeds the column length in the target table.

Action:

Increase the column length in the target table:

ALTER TABLE <target-schema>.<target-table> ALTER COLUMN <column> SET DATA TYPE <expected-type>(<expected-length>);

Then restart the replication.

Problem: Type Mismatch

org.apache.ignite.lang.MarshallerException: IGN-MARSHALLING-1 Value type does not match [column='<column>', expected=<type1>, actual=<type2>] TraceId:3a838ed3

There is a type mismatch between the source cache field and the target table column.

Actions:

If the source cache contains entries with the same type for the problem field, this is a configuration issue — the target table column must use the correct type mapping (see the type mapping table).

If the source cache contains entries with different types in the problem field, configure a RowTransformer to handle entries for this cache.

In both cases, drop the target table and re-create it with the correct column types. Then restart the replication.

Problem: Column Does Not Allow NULLs

org.apache.ignite.lang.MarshallerException: IGN-MARSHALLING-1 Failed to serialize row for table <target-schema>.<target-table>. Column '<column>' does not allow NULLs

A source cache entry field contains null values, but the target table column has a NOT NULL constraint.

Actions:

  • If the column is not part of the primary key, drop and re-create the target table without the NOT NULL constraint on the affected column. Then restart the replication.

  • If the column is part of the primary key, configure a EntryTransformer to replace null values with placeholder values. Update the DR connector’s cache mapping to use the EntryTransformer, restart the DR connector, and then restart the replication.

Problem: Unsupported User Type

Caused by: java.lang.IllegalStateException: Unable to process entries: table=<target-schema>.<target-table>, cache=<cache-name>
        at org.gridgain.internal.dr.DrReceiver.newEntryProcessingException(DrReceiver.java:216)
        at org.gridgain.internal.dr.DrReceiver.processEntries(DrReceiver.java:141)
        ... 27 more
Caused by: java.lang.IllegalStateException: User type is not supported: typeId=368385759
        at org.gridgain.internal.dr.binary.BinaryUtils.unmarshal(BinaryUtils.java:355)
        at org.gridgain.internal.dr.binary.BinaryObject.fieldByOrder(BinaryObject.java:115)

A source cache entry field contains a user-defined type that GridGain 9 does not support.

Action:

Configure a EntryTransformer to convert the user type to one or more supported columns. Adjust the target table schema if needed. Then restart the replication.

Problem: Network Connectivity Issue between DR connector and Gridgain 8 / Gridgain 9 cluster.

DR connector loses network connectivity with the source Gridgain 8 cluster or the target Gridgain 9 cluster.

Action:

No action is needed. Network connection is reestablished automatically.