Apache Ignite’s Baseline Topology explained

When the Ignite project emerged in the Apache Software Foundation, it was thought of as a pure in-memory-solution: a distributed cache that put data into memory to speed access time. But then in 2017 came Apache® Ignite™ 2.1, which saw the debut of Ignite’s Native Persistence module that allowed Ignite to be treated as a full-blown distributed database. Since then Ignite has not depended on an external persistent storage mechanism – and the pitfalls of database configuration and administration that went with it.

 

Ignite native persistence is a distributed ACID and SQL-compliant disk store that transparently integrates with Ignite's durable memory. Ignite persistence is optional and can be turned on and off. When turned off Ignite becomes a pure in-memory store.

 

However, Ignite’s persistent-mode raises new questions with some users, such as: “How can I prevent intractable data conflicts in the split-brain case?” “Can I continue without a partition rebalance if data actually won’t be lost by a node failure?” “How can I automate additional actions, such as cluster activation?”

 

Baseline Topology to the rescue!

 

Apache Ignite’s Baseline Topology is a set of server nodes intended to store data both in memory and in Ignite persistence. The nodes from the baseline topology are not limited in terms of functionality, and behave as regular server nodes that act as a container for data and computations in Ignite.

 

If Ignite persistence is enabled, Ignite enforces the baseline topology concept which represents a set of server nodes in the cluster that will persist data on disk.

 

Usually, when the cluster is started for the first time with Ignite persistence on, the cluster will be considered inactive disallowing any CRUD operations. For example, if​ you try to execute a SQL or key-value operation, an exception will be thrown, as shown in the picture below:

 

undefined

 

This is done to avoid possible performance, usability and consistency issues that might occur if the cluster is being restarted and applications start modifying the data that may be persisted on the nodes that have not yet rejoined the cluster. Therefore, it's required to define the baseline topology for the cluster with Ignite persistence enabled, and after that, the topology can be either maintained manually or automatically by Ignite. Let's dive into the details of this concept and see how to work with it.
 

Baseline Topology: first steps

 

Conceptually, an in-memory-mode cluster is trivial: no dedicated nodes, all peers are equal, each can be given a cache partition, get a computing job, or you can deploy a service on it. If, however, the node drops out the topology, the user requests will be served by other nodes, and the data from the dropped node will be no longer available.

 

In the persistent-mode, nodes remain stateful even after restart; during bootstrap the node data are read from disk, and its state restores. So the node’s reloading doesn’t require the complete data replication from the other cluster nodes (aka rebalancing). And the data, being present in the cluster at the moment of failure, will be restored from a local drive.

 

Baseline Topology is the mechanism that distinguishes the set of stateful nodes (that may be restored after restart) from all other nodes. Baseline Topology (from now on I’ll refer to it as “BLT”) is basically a collection of node identifiers that have been configured for data storage.

 

Persisted-data: passing denied

 

The trouble with a distributed system known as “split brain” is difficult enough already, but it becomes even more treacherous in the context of persistence. Consider a simple example: we have a cluster and a replicated cache.

 

Perform some simple operations on it as follows:

  1. Stop a cluster and run a node called subset A.
  2. Update some keys in the cache.
  3. Stop subset and run subset B.
  4. Apply other updates for the same keys.
image

 

Because Ignite works as a database, the updates won’t be lost after the nodes from the second subset are put on hold; this data becomes available as soon as we run the second subset again. After restoring the initial cluster, different nodes can contain different values for the same key.

 

By just stopping and running nodes, we managed to bring the data in the cluster into an undefined state, which is impossible to resolve automatically. We need BLT to prevent such situations.

 

The idea goes as follows: in the persistent mode the cluster passes an additional stage: activation.

 

During the very first activation we create and save on disk the first Baseline Topology, which contains information on every node present in the cluster at the moment of activation.

 

This data also includes the hash, computed against the ids of online-nodes. If, during the next activation, the topology lacks some nodes (for example, the cluster was reloaded, and one node became out of service), then the hash will be computed again and the previous value will be saved in the activation history within the same BLT.

 

So, Baseline Topology maintains a hash chain, describing the cluster composition as per every activation.

 

At the stages 1 and 3, having run the node subsets, the user needs to activate an uncomplete cluster, and then every online-node will update the BLT locally -- adding a new hash into it. All of the nodes of every subset could compute the same hash, but the hashes will differ from one subset to another.

 

You probably can guess what follows: If a node tries to join a “foreign” group, then the system determines that the said node has been activated outside of this group and access will be denied.

 

Note, however, that such a validation mechanism grants no absolute protection of collisions in the Split-Brain scenario. If the cluster divides into two halves, such as every half retains at least one copy of the partition, and the halves weren’t reactivated, we can run into a situation when each half gets colliding changes of the same data. BLT doesn't refute the CAP-theorem, but protects us from collisions by explicit administration errors.

 

Goodies

 

Apart from preventing collisions, BLT allows you to implement a few facultative (but nice) options.

 

Goodie №1 — one less manual action. The aforementioned activation should earlier be done manually after each cluster reboot; there were no automation solutions “out of the box.” With BLT, the cluster independently decides whether the activation is needed.

 

Although an Ignite-cluster is an elastic system, the nodes can be added or removed automatically. BTL is built on that premise that in database mode, the user maintains a stable cluster configuration.

image

When the cluster is first activated, the newly composed Baseline Topology remembers which nodes should be present in the topology. After a reboot each node checks the status of the other BLT nodes. As soon all of the nodes come into online mode, the cluster will be activated automatically.

 

Goodie №2 — more sparing network connectivity. This idea, again, is built on the premise that the topology will remain stable over the long run. Before BLT, if a node dropped out of the topology even for 10 minutes, it caused rebalancing of cache partitions to maintain enough backups. But why spend network resources and slow down the whole cluster if the node becomes healthy again within minutes? Baseline Topology allows you to optimize this behavior.

 

The cluster assumes, by default, that the faulty node will be back up again soon. Some caches in this timeframe will work with fewer backups, though it will not cause the service to slow down or stall.
 

Managing Baseline Topology

 

So, we already know one approach: Baseline Topology being automatically set up by the very first cluster activation. In such a setting we embrace in BLT all server nodes, being online at the moment of activation.

 

Manual BLT administration is done via control script from the Ignite distribution, a detailed description given on the documentation page about cluster activation.

 

The script exposes a very simple API and supports just three operations: “add node,” “remove node” and “new Baseline Topology setup.”

 

By the way, if adding a node is a relatively simple operation without major pitfalls, removing a node from BLT is a much more sophisticated task. Doing this under a workload can cause race conditions -- and in the worst scenario the entire cluster can become unresponsive. So, removal requires one more condition to be met: the node to be removed must be offline. If you try to remove an active node, the script throws an error and the operation won’t run.

 

When removing a node from BLT you still need to do one thing manually: stop this node. Although, such a scenario surely won’t be the most common one and the additional overhead will be negligible.

 

The Java-interface for BLT management is even simpler: it exposes just a single method, enabling you to set up a Baseline Topology based on a node list.

 

Conclusion

 

Securing data integrity is an essential task, one that must be solved in any data store. Concerning a distributed DBMS, and particularly Apache Ignite, such a task becomes much more challenging.

 

Baseline Topology enables us to account for some real-word scenarios where the data integrity could be broken.

 

Ignite gives high priority to the performance as well, and BLT also facilitates significant resource savings and allows for shorter system response times.

 

Native Persistence functionality was implemented very recently and will surely evolve, becoming more robust, performant and easier to use. And the Baseline Topology  concept will evolve along with it.