Change Data Capture

GridGain provides Change Data Capture (CDC) implementation that can be used to integrate with external databases and propagate changes between them.

Preparing CDC

Before starting replication, you need to configure a CDC source and sink. These will be used to start replication later.

Prepare Data Schemas

For the replication to be successful, you need to make sure table schemas in source and sink are the same. If you do not yet have tables in the sink, GridGain will automatically create tables with matching schemas when replication starts.

Configuring CDC Source

cdc source create --name <source_name> --type <source_type>  --tables <table_name1>[,<table_name2>,...] [--parameters <property_name1>=<property_value1>] [--parameters <property_name2>=<property_value2>]...

Command arguments:

Property Default Description

Property	Default	Description
name		The name the CDC source. This name will be used to reference it.
type		Data source type. Currently, only `gridgain` is supported.
tables		Comma-separated list of tables that will be replicated.
create-table-if-not-exists	true	Optional. If `true`, missing tables will be automatically created in the sink prior to starting replication. Otherwise, replication with missing tables will fail and return a list of missing tables.
parameters		Optional additional parameters. Currently, the following parameters are supported: page-size - the size of each page sent via replication. poll-interval - the interval at which GridGain checks for updates in source tables.

name

The name the CDC source. This name will be used to reference it.

type

Data source type. Currently, only gridgain is supported.

tables

Comma-separated list of tables that will be replicated.

create-table-if-not-exists

true

Optional. If true, missing tables will be automatically created in the sink prior to starting replication. Otherwise, replication with missing tables will fail and return a list of missing tables.

parameters

Optional additional parameters. Currently, the following parameters are supported:

page-size - the size of each page sent via replication.
poll-interval - the interval at which GridGain checks for updates in source tables.

Below is an example of a CDC data source:

cdc source create --name gridgain_source --type gridgain --tables PUBLIC.MY_TABLE1 --parameters "poll-interval-ms"=1000 --parameters "page-size"=1024

Updating CDC Source

You can update the CDC source when the relevant replication is not running by using the cdc source update command. The command uses the same arguments as the cdc source create command. For example:

cdc source update --name gridgain_source --type gridgain --parameters "poll-interval-ms"=1000 --tables=PUBLIC.MY_TABLE1

Removing CDC Source

When you no longer need the CDC source, you can delete it with the cdc source delete command:

cdc source delete --name gridgain_source

Configuring CDC Sink

cdc sink create --name <sink_name> --type <sink_type> [--parameters <property_name1>=<property_value1>] [--parameters <property_name2>=<property_value2>]...

Command parameters:

Property Default Description

Property	Default	Description
name		The name the CDC sink. This name will be used to reference it.
type		Data sink type. Currently, only `Iceberg` is supported.
parameters		Optional additional parameters. Specific list of parameters depends on the type of the data sink. For Iceberg data sink, Iceberg configuration properties are supported.

name

The name the CDC sink. This name will be used to reference it.

type

Data sink type. Currently, only Iceberg is supported.

parameters

Optional additional parameters. Specific list of parameters depends on the type of the data sink. For Iceberg data sink, Iceberg configuration properties are supported.

Below is an example of a CDC data sink:

cdc sink create --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg" --parameters "catalog.warehouse"="s3://my-bucket/my/key/prefix" --parameters "catalog-impl"="org.apache.iceberg.aws.glue.GlueCatalog" --parameters "io-impl"="org.apache.iceberg.aws.s3.S3FileIO" --parameters "s3.access-key-id"="my-secret-key-id" --parameters "s3.secret-access-key"="my-secret-access-key"

Updating CDC Sink

You can update the CDC sink when the relevant replication is not running by using the cdc sink update command. The command uses the same arguments as the cdc sink create command. For example:

cdc sink update --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg"

Removing CDC Sink

When you no longer need the CDC sink, you can delete it with the cdc sink delete command:

cdc sink delete --name gridgain_source

Running CDC

Creating Replication

To run CDC, you need to prepare the replication that connects the previously connected CDC source with the CDC sink. When created, replication will not start automatically, but can be used to review the involved tables by using the cdc replication status.

The example below shows how you can create a replication:

cdc replication create --name <replication_name> --source <gridgain_source> --sink <target_sink> --mode (ALL|NEW_DATA) [--execution-nodes <node_id1>[,<node_id2>,...]]

Command parameters:

Property Default Description

Property	Default	Description
name		The name the replication. This name will be used to reference it.
source		Data source to use in replication.
sink		Data sink to use in replication.
send-existing-data		Determines if historic data will be transferred. Possible values: `ALL` - all existing data will be replicated first, then updates will be transferred. `NEW_DATA` - only new data will be transferred. Previously existing data will remain in the source.
execution-nodes		The list of nodes that will be used to run replication. If not specified, the node the command is executed on is used.

name

The name the replication. This name will be used to reference it.

source

Data source to use in replication.

sink

Data sink to use in replication.

send-existing-data

Determines if historic data will be transferred. Possible values:

ALL - all existing data will be replicated first, then updates will be transferred.
NEW_DATA - only new data will be transferred. Previously existing data will remain in the source.

execution-nodes

The list of nodes that will be used to run replication. If not specified, the node the command is executed on is used.

Starting Replication

To start the replication process, use the cdc replication start command. This will initiate the transfer process.

The example below shows how you can start a replication created on a previous step:

cdc replication start --name <replication_name>

Monitoring Replication

You can get the list of existing replications with the cdc replication list command:

cdc replication list

You can also get detailed information about a replication by using the cdc replication status command. The command will return the replication status and progress for each table involved in the replication.

For example, the following command will

cdc replication status --name gg_to_iceberg

Replication Failover

If at any point the node executing replication leaves the cluster, the replication process will automatically be transferred to a different node from the list specified in the execution-nodes parameter. If none of the specified nodes are available in the cluster, replication will fail.

Stopping Replication

To stop replication, use the cdc replication stop command:

cdc replication stop --name gg_to_iceberg

Once the replication is stopped, the data will no longer be transferred.

Configuring CDC with Iceberg

This section describes how to configure Change Data Capture with Apache Iceberg as the data sink.

Overview

CDC with Iceberg allows you to replicate data from GridGain tables to Iceberg tables stored in various backends (S3, HDFS, local filesystem). The configuration involves creating a CDC source, an Iceberg sink, and a replication between them.

Configuration Steps

Create CDC Source

Configure the GridGain source that will be monitored for changes:

cdc source create --name gridgain_source --type gridgain --tables PUBLIC.MY_TABLE1 --parameters "poll-interval-ms"=1000 --parameters "page-size"=1024

Create Iceberg Sink

Configure the Iceberg sink with the target storage backend:

cdc sink create --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg" --parameters "catalog.warehouse"="s3://my-bucket/my/key/prefix" --parameters "catalog-impl"="org.apache.iceberg.aws.glue.GlueCatalog" --parameters "io-impl"="org.apache.iceberg.aws.s3.S3FileIO" --parameters "s3.access-key-id"="my-secret-key-id" --parameters "s3.secret-access-key"="my-secret-access-key"

Configuration parameters:

Parameter Description

Parameter	Description
catalog.type	Type of Iceberg catalog (usually `iceberg`).
catalog.warehouse	Storage location.
catalog-impl	Iceberg catalog implementation class.
io-impl	File I/O implementation for the storage backend.

catalog.type

Type of Iceberg catalog (usually iceberg).

catalog.warehouse

Storage location.

catalog-impl

Iceberg catalog implementation class.

io-impl

File I/O implementation for the storage backend.

Additional parameters support Iceberg table properties.

Create and Start Replication

Create the replication connecting source and sink:

cdc replication create --name gg_to_iceberg --source gridgain_source --sink iceberg_sink --mode ALL

Start the replication process:

cdc replication start --name gg_to_iceberg

© 2026 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.

Last updated on May 08, 2026