Essential Core Architecture in Apache Ignite and GridGain

Apache Ignite 3 delivers a modernized architecture for faster, simpler, and more reliable data processing. In this session, GridGain Lead Software Engineer Ivan Bessonov explains how Ignite 3 redefines data distribution and transactions.

Highlights:

  • New topology model: Physical and logical topologies replace legacy clustering
  • Distribution zones: Simplify and automate data placement
  • Storage profiles: Flexible, reusable storage configurations
  • Stronger transactions: Strict serializable isolation by default\
  • No partition map exchange: No pauses or slowdowns
  • Unified APIs: Use SQL and key-value seamlessly.

Ignite 3 transforms how clusters scale, manage data, and maintain performance under heavy load.

Ivan Bessonov
Lead Software Engineer at GridGain

Hello everyone, my name is Ivan. I’m a Lead Software Engineer at GridGain, primarily responsible for storage implementations. Today, I want to give you a short presentation about the core architecture in Apache Ignite, focusing mainly on data distribution within a cluster. Our plan is to discuss what exists in Ignite 2, what has changed in Ignite 3, and compare the two to show the main differences.

Let’s start with cluster topologies, which are one of the most fundamental aspects of Ignite 2. Ignite 2 consists of multiple nodes that together form a cluster. The “cluster topology” is the collection of nodes currently alive—not shut down or in the process of joining—just the regular Ignite nodes. There is also another concept called “baseline topology,” which has a different meaning. The baseline topology is a logical collection of Ignite nodes that may be used to store data. The main difference between cluster topology and baseline topology is that some baseline nodes might be temporarily offline, while some cluster nodes might not be included in the baseline topology. In general, these two topologies intersect but are not identical.

The baseline topology can be managed automatically by the system using the auto-adjust setting, defined in milliseconds. This means that after a cluster topology change, the system waits for the defined time before including or excluding a node from the baseline. You can also modify the baseline manually in two ways: by setting an explicit list of node IDs or by using known topology versions.

In Ignite 3, the discovery service was completely redesigned to provide a more flexible and modern approach. There are now two concepts: the physical topology and the logical topology. The physical topology includes all live Ignite nodes, even those currently starting, recovering, or undergoing maintenance. The logical topology, equivalent to the cluster topology in Ignite 2, contains only the nodes that have already joined the cluster and can store data. In Ignite 3, there is no baseline topology as such. There is a similar concept, but it’s no longer called “topology” to avoid confusion.

Now let’s move on to what operates within these topologies. In Ignite 2, we have the concept of “cache groups.” These are often underrated, as most people use individual caches instead. A cache group is a collection of caches that share the same properties related to data distribution—such as the same affinity function, partition distribution, backup configuration, and other fine-tuning settings. Caches in the same group also share the same data region, meaning data is stored and rebalanced together, keeping everything collocated.

However, there are some challenges with cache groups. Each cache group is a subset of the baseline nodes, but the baseline management is shared across all groups. If auto-adjustment occurs, it affects every cache group. Another issue is that fine-grained cache group configuration can be difficult to achieve with thin clients. In many cases, you have to use Java predicates or class names in configuration, which means that some options can only be defined in static XML files or Java code in thick clients or embedded servers.

In Ignite 3, caches and cache groups have been replaced by tables and distribution zones. A distribution zone is essentially a collection of tables stored together across the same set of nodes. It defines how data is distributed, including settings such as the number of partitions, replicas, and backups. A distribution zone also serves a similar purpose to the baseline topology, with automatic adjustment settings that manage which nodes are included in the zone.

In Ignite 3, replicas replace the term “backups.” A replica is simply a copy of the data, and the number of replicas equals the number of backups plus one. Distribution zones also include scale-up and scale-down timeouts. The scale-up timeout defines how long the system waits after a new node joins before adding it to the zone, while the scale-down timeout defines how long it waits to remove a node after it leaves the topology. These settings are now measured in seconds instead of milliseconds for clarity. Distribution zones also support node filters in JSON path syntax, which allow you to include or exclude nodes based on attributes such as region or storage type.

In Ignite 2, data storage was handled through data regions, which are chunks of memory used to store data. There were only two types—persistent and non-persistent—controlled by a simple flag. If persistence was enabled, data was stored in both memory and on disk, surviving node restarts. If it was disabled, data existed only in memory and would be lost after a reboot.

In Ignite 3, data regions have been replaced by “storage profiles.” Each distribution zone can have multiple storage profiles, and a node that lacks a required profile is excluded from the zone. Storage profiles are configured locally and can define parameters such as size and storage engine type. For example, a profile named “default” might specify a certain size and use the “AIPersist” engine, equivalent to a data region with persistence enabled. The new approach allows multiple storage engines, meaning different types of profiles can coexist within the same cluster. A single storage profile can also be reused across zones.

When creating tables, you can specify which distribution zone and storage profile to use. If no zone or profile is specified, the default ones are used automatically, and the default zone always exists in the cluster.

One of the biggest architectural changes from Ignite 2 to Ignite 3 involves removing the old “partition map exchange” mechanism. In Ignite 2, this legacy process paused all transactions whenever the cluster topology changed—for example, when a node joined or left the cluster or when a new cache was created. This caused temporary “stop-the-world” pauses and spikes in transactional load. In Ignite 3, this mechanism has been completely removed. Distribution zones no longer affect one another. Creating or removing zones and tables doesn’t interrupt running transactions, and updates to cluster state are far less invasive. Only in rare cases—such as when a node holding a volatile transaction lock table leaves—will a rollback occur.

Ignite 2 supported two cache types: atomic and transactional. In Ignite 3, these modes have been unified into one consistent model using a strict serializable isolation level. This ensures strong consistency without significantly affecting performance. Single operations like “put” remain as fast as before. Both the key-value and SQL APIs use the same transactional model and can be used together in a single transaction.

Ignite 3 also distinguishes between read-write and read-only transactions. Read-write transactions acquire locks and can face deadlocks, especially with large operations, while read-only transactions are lock-free and much faster. Because Ignite 3 supports multi-version storage, read-only transactions can continue even if the underlying table is dropped—they work on a consistent snapshot until they finish.

In GridGain, there’s an additional mode for analytical queries. While Ignite 3 is primarily OLTP-oriented, some analytical workloads benefit from specialized storage engines optimized for read-heavy operations. These can be configured to improve query performance for large-scale analytics.

To summarize, the cluster topology in Ignite 2 is now split into physical and logical topologies in Ignite 3, providing better visibility and flexibility. The baseline topology has evolved into distribution zones with independent auto-adjustment. Cache groups with custom affinity functions have been replaced by distribution zones with configurable partitions and replicas. Filters that were once Java-based are now defined in structured SQL configurations. Data regions have become storage profiles, allowing multiple storage engines within a single cluster. Transactions now follow a strict serializable model with clear separation between read-write and read-only operations. And best of all, the old partition map exchange is gone—transactional load is smoother and unaffected by node join or leave events.

Thank you very much.