What's New and Cool in Ignite 3

Ignite 3.0 is a big step forward for the Apache Ignite community. It brings overarching improvements in all areas, plus some specific new features. You’ll get information on those new features as well as changes to nomenclature, transactional specifics, SQL improvements, split-brain and replication capabilities, and a lot more. Not only will you learn about the new features, but you’ll see them in action in a live demo!

Come to this talk to learn about:

  • Modern APIs
  • Performance
  • Transactional SQL engine
  • New CLI and REST management tools
  • Architecture insights
  • Too much to list!
Pavel Tupitsyn
Software Engineer at GridGain Systems

Hello everyone, my name is Ivan Bof. I’m a Lead Software Engineer at GridGain, responsible for storage implementations. Today I want to present a short talk about the core architecture in Apache Ignite, mainly related to data distribution in your cluster. Our plan is to discuss what we have in Ignite 2, what we have in Ignite 3, and compare them to highlight the main differences.

Let’s start with cluster topologies, which are the most fundamental aspect of Ignite 2. It consists of multiple Ignite nodes that form a cluster. The term “cluster topology” refers to the collection of nodes that are currently alive — not shut down, not in the process of joining — just regular Ignite nodes. Another term, “Baseline topology,” has a completely different meaning. It’s a logical collection of Ignite nodes that may be used to store data. The main difference between cluster topology and baseline topology is that some baseline nodes might be temporarily offline, and some cluster nodes might not be included in the baseline topology. In general, they intersect.

The baseline topology is one collection of nodes that can be managed automatically by the system using the auto-adjust setting, defined in milliseconds. This means that after a cluster topology change, the system waits a specified number of milliseconds before including or excluding a node from the baseline. You can also change the baseline manually in two ways — by setting an explicit list of node IDs or by using known topology versions.

In Ignite 3, the approach to the discovery service has been completely updated. This provides a more flexible way to understand what’s happening in your cluster. There is now a “physical topology,” which includes all live Ignite nodes, including those currently starting or performing maintenance tasks. The other concept is the “logical topology,” which is equivalent to the cluster topology in Ignite 2. It only contains live nodes that are already joined to the cluster and can store your data. There is no longer a concept of a baseline topology. There is something similar to it, but we don’t call it a topology anymore to avoid confusion. I’ll talk about that a little later.

Now that we’ve covered topologies, let’s move on to the things that work within them. In Ignite 2, we have the concept of “cache groups,” which are often underrated. Most people use only caches, not cache groups. A cache group is a collection of caches that share the same properties related to data distribution — such as the same affinity function (the distribution of partitions and backups), backup filters, and other fine-tuning settings. Caches that belong to the same cache group store data within the same data region, and all data is rebalanced together. Everything is stored and collocated.

One implementation detail about cache groups is that each cache group is a subset of baseline nodes, but the baseline itself has shared management across all cache groups. If auto-adjustment of the baseline happens, it affects all cache groups. Another detail is that fine-grained configuration of cache groups can be difficult to achieve using thin clients. Sometimes, you have to use Java predicates or class names in configuration, which means that some things can only be configured in static XML files or through Java code in thick clients or embedded servers.

In Ignite 3, cache groups are replaced by a new concept. It’s not caches anymore — it’s tables. We have to get used to this new terminology. The “distribution zone” is just a collection of tables, similar to how a cache group was a collection of caches. These tables are always stored together on the same set of nodes. A distribution zone defines settings such as partitions and backups, and it also serves a role similar to the baseline topology, as it can auto-adjust the nodes included in the distribution zone.

The configuration options include the number of partitions, the number of replicas (which is the number of data copies — previously called backups, with replicas equal to backups plus one), and the auto-adjust timeouts. In Ignite 3, there are two timeouts instead of one: the scale-up timeout (the delay between a node joining and being added to a distribution zone) and the scale-down timeout (the delay between a node leaving and being removed from a zone). These timeouts are now defined in seconds, not milliseconds, which makes more sense. There’s also a node filter setting in JSON path syntax, which defines which nodes from the logical topology are included in the distribution zone. The filter is based on node attributes — key-value pairs such as region=US or storage=SSD. Each Ignite node can have different attributes, allowing for flexible configuration.

Let’s look back at Ignite 2 and the concept of data regions. Each cache is located in a data region, which is a chunk of memory used to store data. In Ignite 2, there are two types of data regions, configured by a single Boolean flag. If persistence is enabled, data is stored both in memory and on disk, meaning it survives node restarts. If persistence is disabled, data is only stored in memory and is lost if the cluster is rebooted. This is the classic cache scenario.

In Ignite 3, data regions have been replaced by “storage profiles.” For each distribution zone, you can configure multiple storage profiles instead of just one. All nodes within a distribution zone must have these profiles configured. If a node lacks a required profile, it’s not included in the distribution zone. This means that not every node needs every data region, saving configuration effort and resources. Storage profiles are configured locally on each node. For example, a “default” profile might have a defined size and use an engine called “AIPersist,” which is equivalent to having persistence enabled. The idea is that Ignite 3 can support multiple storage engines in the future, allowing for more flexible storage configurations. You can also reuse the same storage profile across multiple distribution zones. It’s fine to have a single storage profile for the entire cluster, even with multiple distribution zones.

When creating tables, you can assign them to specific zones and storage profiles. If you don’t specify a zone, the default zone is used. If you don’t specify a storage profile, the default storage profile is used. The default zone always exists in your cluster, so you don’t need to create it manually.

Let’s talk about one of the more infamous parts of Ignite 2’s architecture: the partition map exchange. This was a legacy “stop-the-world” mechanism that paused all transactions to perform topology updates (like when creating or destroying caches, or when nodes joined or left). This caused pauses and spikes in transaction load, which were undesirable. Sometimes, even unrelated caches would be affected by operations on other caches. For example, creating a new cache could impact the performance of existing caches.

Ignite 3, being a new project with mostly new code, completely abandons this legacy mechanism. The new approach to data distribution management is much more efficient. Distribution zones are independent, meaning adding or removing nodes doesn’t affect transactional load (except in special cases, such as when a node holding a transaction lock table leaves the cluster). Creating or dropping tables or zones no longer impacts unrelated transactions. DDL operations and cluster state updates are now far less invasive and no longer disrupt transactions.

In Ignite 2, there were two cache modes: atomic and transactional. In Ignite 3, these are replaced with a single model — strict serializable transactions. This ensures the highest level of consistency. Don’t worry about performance — single operations like “put” are still as fast as before. Both the key-value and SQL APIs share the same transaction model and can be used together within a single transaction.

There are two main types of transactions now. Write transactions, which modify data, require locks and are heavier, as they can lead to potential deadlocks or rollbacks if nodes leave. Read-only transactions, on the other hand, are faster, lock-free, and can even continue if a table is dropped mid-query. This is possible because Ignite 3 uses multi-version storage, where each read-only transaction has its own snapshot of the data.

GridGain adds another mode for analytical queries, allowing optimized storage engines designed for analytical workloads, which are different from typical OLTP operations. For more details, there’s a separate talk by my colleague on this topic.

To summarize: In Ignite 3, the cluster topology is now split into physical and logical topologies, making monitoring easier and more logical. The baseline topology is replaced by distribution zones with independent auto-adjustment. Cache groups with custom affinity functions are replaced by distribution zones with configurable partitions and replicas, all managed through SQL. Cache group filters written as Java predicates are now structured configurations using SQL and JSON. Cache groups with a single data region become distribution zones with multiple storage profiles, meaning you can have different tables in the same zone using different storage engines. There’s no longer a separate atomic or transactional mode — only strict serializable transactions, with a clear distinction between read-write and read-only. And best of all, there’s no more partition map exchange. Transaction load is smoother and unaffected by node join or leave events.

Thank you very much.