GridGain Developers Hub

Kafka Connector Data Schema

GridGain Kafka Connectors support data schemas. This enables numerous existing non-Ignite Sink Connectors to understand data injected with Ignite Source Connector and the Ignite Sink Connector to understand data injected by non-Ignite Source Connectors.

Ignite Type Support

Source and Sink Connectors work with Ignite data in Ignite Binary format.

The table below provides mappings between Kafka schema types and known logical types and Ignite Binary types.

Kafka Type Ignite Type

INT8

BYTE

INT16

SHORT, CHAR

INT32

INT

INT64

LONG

FLOAT32

FLOAT

FLOAT64

DOUBLE

BOOLEAN

BOOLEAN

STRING

STRING, UUID, CLASS

BYTES

BYTE_ARR

ARRAY(valueSchema)

COL SHORT_ARR INT_ARR LONG_ARR FLOAT_ARR DOUBLE_ARR CHAR_ARR BOOLEAN_ARR DECIMAL_ARR STRING_ARR UUID_ARR DATE_ARR OBJ_ARR ENUM_ARR TIME_ARR DATE_ARR TIMESTAMP_ARR DECIMAL_ARR

MAP

MAP

STRUCT

OBJ, BINARY_OBJ

Date (Logical Type)

DATE

Time (Logical Type)

TIME

Timestamp (Logical Type)

TIMESTAMP

Decimal (Logical Type)

DECIMAL

Updates and Removals

By default, Source Connector does not process removed Ignite cache entries. Set the shallProcessRemovals configuration setting to true to make the Source Connector process removals. In this case Source Connector injects a record with null value into Kafka to indicate that the key was removed. Sink Connector removes keys with null values from the cache. Using null as a value to indicate a removed entry works because Ignite does not support null cache values.

For performance reasons, Sink Connector does not support existing cache entry update by default. Set shallProcessUpdates configuration setting to true to make the Sink Connector update existing entries.

Schema Migration

Schema migration is implicit for GridGain Connectors. Both the Source and Sink Connectors pull and push cache entries in cross-platform Ignite Binary format, which intrinsically supports changing schemas. Ignite cache keys and values are dynamic objects that could have a different set of fields.

For performance reasons, Source connector caches key and values schemas. The schemas are created as the first cache entry is pulled and re-used for all subsequent entries. This works only if the schemas never change. Set isSchemaDynamic to true to support schema changes.

Schemaless Operation

Source Connector does not generate schemas if the isSchemaless configuration setting is true.

Disabling schemas improves performance because the Connectors would not build schemas and would not convert keys and values into Kafka format. This comes at a cost of non-Ignite Sink converters unable to understand the data injected into Kafka in the Ignite Binary format.

Some examples when disabling Source schema makes sense:

  • You are ready to do some coding to extend a non-Ignite converter to process the Ignite Binary objects to achieve higher performance.

  • The Ignite Data Replication example does not need schemas since both the Source and Sink are GridGain connectors.