GridGain Developers Hub

Change Data Capture

GridGain provides Change Data Capture (CDC) implementation that can be used to integrate with external databases and propagate changes between them.

Preparing CDC

Before starting replication, you need to configure a CDC source and sink. These will be used to start replication later.

Prepare Data Schemas

For the replication to be successful, you need to make sure table schemas in source and sink are the same. If you do not yet have tables in the sink, GridGain will automatically create tables with matching schemas when replication starts.

Configuring CDC Source

cdc source create --name <source_name> --type <source_type>  --tables <table_name1>[,<table_name2>,...] [--parameters <property_name1>=<property_value1>] [--parameters <property_name2>=<property_value2>]...

Command arguments:

Property Default Description

name

The name the CDC source. This name will be used to reference it.

type

Data source type. Currently, only gridgain is supported.

tables

Comma-separated list of tables that will be replicated.

create-table-if-not-exists

true

Optional. If true, missing tables will be automatically created in the sink prior to starting replication. Otherwise, replication with missing tables will fail and return a list of missing tables.

parameters

Optional additional parameters. Currently, the following parameters are supported:

  • page-size - the size of each page sent via replication.

  • poll-interval - the interval at which GridGain checks for updates in source tables.

Below is an example of a CDC data source:

cdc source create --name gridgain_source --type gridgain --tables PUBLIC.MY_TABLE1 --parameters "poll-interval-ms"=1000 --parameters "page-size"=1024

Updating CDC Source

You can update the CDC source when the relevant replication is not running by using the cdc source update command. The command uses the same arguments as the cdc source create command. For example:

cdc source update --name gridgain_source --type gridgain --parameters "poll-interval-ms"=1000 --tables=PUBLIC.MY_TABLE1

Removing CDC Source

When you no longer need the CDC source, you can delete it with the cdc source delete command:

cdc source delete --name gridgain_source

Configuring CDC Sink

cdc sink create --name <sink_name> --type <sink_type> [--flush-policy <flush_policy>] [--parameters <property_name1>=<property_value1>] [--parameters <property_name2>=<property_value2>]...

Command parameters:

Property Default Description

name

The name the CDC sink. This name will be used to reference it.

type

Data sink type. Currently, only Iceberg is supported.

flush-policy

DEFAULT

How data is written to the sink. Possible values: PER_PAGE, DEFAULT.

parameters

Optional additional parameters. Specific list of parameters depends on the type of the data sink. For Iceberg data sink, Iceberg configuration properties are supported.

Below is an example of a CDC data sink:

cdc sink create --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg" --parameters "catalog.warehouse"="s3://my-bucket/my/key/prefix" --parameters "catalog-impl"="org.apache.iceberg.aws.glue.GlueCatalog" --parameters "io-impl"="org.apache.iceberg.aws.s3.S3FileIO" --parameters "s3.access-key-id"="my-secret-key-id" --parameters "s3.secret-access-key"="my-secret-access-key"

Updating CDC Sink

You can update the CDC sink when the relevant replication is not running by using the cdc sink update command. The command uses the same arguments as the cdc sink create command. For example:

cdc sink update --name iceberg_sink --type iceberg --parameters "catalog.type"="iceberg"

Removing CDC Sink

When you no longer need the CDC sink, you can delete it with the cdc sink delete command:

cdc sink delete --name gridgain_source

Running CDC

Creating Replication

To run CDC, you need to prepare the replication that connects the previously connected CDC source with the CDC sink. When created, replication will not start automatically, but can be used to review the involved tables by using the cdc replication status.

The example below shows how you can create a replication:

cdc replication create --name <replication_name> --source <gridgain_source> --sink <target_sink> --mode (ALL|NEW_DATA) [--execution-nodes <node_id1>[,<node_id2>,...]]

Command parameters:

Property Default Description

name

The name the replication. This name will be used to reference it.

source

Data source to use in replication.

sink

Data sink to use in replication.

send-existing-data

Determines if historic data will be transferred. Possible values:

  • ALL - all existing data will be replicated first, then updates will be transferred.

  • NEW_DATA - only new data will be transferred. Previously existing data will remain in the source.

execution-nodes

The list of nodes that will be used to run replication. If not specified, the node the command is executed on is used.

Starting Replication

To start the replication process, use the cdc replication start command. This will initiate the transfer process.

The example below shows how you can start a replication created on a previous step:

cdc replication start --name <replication_name>

Monitoring Replication

You can get the list of existing replications with the cdc replication list command:

cdc replication list

You can also get detailed information about a replication by using the cdc replication status command. The command will return the replication status and progress for each table involved in the replication.

For example, the following command will

cdc replication status --name gg_to_iceberg

Replication Failover

If at any point the node executing replication leaves the cluster, the replication process will automatically be transferred to a different node from the list specified in the execution-nodes parameter. If none of the specified nodes are available in the cluster, replication will fail.

Stopping Replication

To stop replication, use the cdc replication stop command:

cdc replication stop --name gg_to_iceberg

Once the replication is stopped, the data will no longer be transferred.