Change Data Capture
GridGain provides Change Data Capture (CDC) implementation that can be used to integrate with external databases and propagate changes between them.
Preparing CDC
Before starting replication, you need to configure a CDC source and sink. These will be used to start replication later.
Prepare Data Schemas
For the replication to be successful, you need to make sure table schemas in source and sink are the same. If you do not yet have tables in the sink, GridGain will automatically create tables with matching schemas when replication starts.
Configuring CDC Source
cdc source create --name <source_name> --type <source_type> --tables <table_name1>[,<table_name2>,...] [--parameters <property_name1>=<property_value1>] [--parameters <property_name2>=<property_value2>]...
Command arguments:
| Property | Default | Description |
|---|---|---|
name |
The name the CDC source. This name will be used to reference it. |
|
type |
Data source type. Currently, only |
|
tables |
Comma-separated list of tables that will be replicated. |
|
create-table-if-not-exists |
true |
Optional. If |
parameters |
Optional additional parameters. Currently, the following parameters are supported:
|
Below is an example of a CDC data source:
cdc source create --name gridgain_source --type gridgain --tables PUBLIC.MY_TABLE1 --parameters "poll-interval-ms"=1000 --parameters "page-size"=1024
Updating CDC Source
You can update the CDC source when the relevant replication is not running by using the cdc source update command. The command uses the same arguments as the cdc source create command. For example:
cdc source update --name gridgain_source --type gridgain --parameters "poll-interval-ms"=1000 --tables=PUBLIC.MY_TABLE1
Removing CDC Source
When you no longer need the CDC source, you can delete it with the cdc source delete command:
cdc source delete --name gridgain_source
Configuring CDC Sink
cdc sink create --name <sink_name> --type <sink_type> [--flush-policy <flush_policy>] [--parameters <property_name1>=<property_value1>] [--parameters <property_name2>=<property_value2>]...
Command parameters:
| Property | Default | Description |
|---|---|---|
name |
The name the CDC sink. This name will be used to reference it. |
|
type |
Data sink type. Currently, only |
|
flush-policy |
DEFAULT |
How data is written to the sink. Possible values: |
parameters |
Optional additional parameters. Specific list of parameters depends on the type of the data sink. For Iceberg data sink, Iceberg configuration properties are supported. |
Below is an example of a CDC data sink:
cdc sink create --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg" --parameters "catalog.warehouse"="s3://my-bucket/my/key/prefix" --parameters "catalog-impl"="org.apache.iceberg.aws.glue.GlueCatalog" --parameters "io-impl"="org.apache.iceberg.aws.s3.S3FileIO" --parameters "s3.access-key-id"="my-secret-key-id" --parameters "s3.secret-access-key"="my-secret-access-key"
Updating CDC Sink
You can update the CDC sink when the relevant replication is not running by using the cdc sink update command. The command uses the same arguments as the cdc sink create command. For example:
cdc sink update --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg"
Removing CDC Sink
When you no longer need the CDC sink, you can delete it with the cdc sink delete command:
cdc sink delete --name gridgain_source
Running CDC
Creating Replication
To run CDC, you need to prepare the replication that connects the previously connected CDC source with the CDC sink. When created, replication will not start automatically, but can be used to review the involved tables by using the cdc replication status.
The example below shows how you can create a replication:
cdc replication create --name <replication_name> --source <gridgain_source> --sink <target_sink> --mode (ALL|NEW_DATA) [--execution-nodes <node_id1>[,<node_id2>,...]]
Command parameters:
| Property | Default | Description |
|---|---|---|
name |
The name the replication. This name will be used to reference it. |
|
source |
Data source to use in replication. |
|
sink |
Data sink to use in replication. |
|
send-existing-data |
Determines if historic data will be transferred. Possible values:
|
|
execution-nodes |
The list of nodes that will be used to run replication. If not specified, the node the command is executed on is used. |
Starting Replication
To start the replication process, use the cdc replication start command. This will initiate the transfer process.
The example below shows how you can start a replication created on a previous step:
cdc replication start --name <replication_name>
Monitoring Replication
You can get the list of existing replications with the cdc replication list command:
cdc replication list
You can also get detailed information about a replication by using the cdc replication status command. The command will return the replication status and progress for each table involved in the replication.
For example, the following command will
cdc replication status --name gg_to_iceberg
Replication Failover
If at any point the node executing replication leaves the cluster, the replication process will automatically be transferred to a different node from the list specified in the execution-nodes parameter. If none of the specified nodes are available in the cluster, replication will fail.
Stopping Replication
To stop replication, use the cdc replication stop command:
cdc replication stop --name gg_to_iceberg
Once the replication is stopped, the data will no longer be transferred.
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.