Apache Ignite Transactions Architecture: Failover and Recovery
In the previous article in this series, we looked at concurrency modes and isolation levels. Here are topics we will cover in the rest of this series:
- Failover and recovery
- Transaction handling at the level of Ignite persistence (WAL, checkpointing, and more)
- Transaction handling at the level of 3rd party persistence
In this article, we will focus on how Apache Ignite handles failover and recovery during transaction execution.
In a distributed cluster consisting of a Transaction coordinator, Primary Nodes and Backup Nodes, there are a number of scenarios where we may lose some or all parts of the cluster, as follows, in increasing order of severity:
- Backup Node failure
- Primary Node failure
- Transaction coordinator failure
Let’s look at each of these in turn and see how Ignite manages failure situations. We’ll begin with Backup Node failure.
Backup Node Failure
Recalling the discussion in the first article in this series, we know that in the two-phase commit protocol, there are the prepare and commit phases. In the event that we lose a Backup Node in either of these phases, no action is required by Ignite, since any transactions will still continue to be applied to the remaining Primary and Backup Nodes in the cluster. This is shown in Figure 1.
Figure 1. Backup Node Failure
After all the active transactions including this one are completed, Ignite will update its cluster topology version due to the node failure, will elect a new node or nodes that will hold a copy of the data previously stored on the lost Backup Node and will start rebalancing in the background to meet the desired data replication level.
Next, let's look at how Ignite manages Primary Node failure.
Primary Node Failure
A Primary Node failure requires different handling depending upon whether the failure occurs at the prepare or the commit phase.
If the failure occurs at the prepare phase, the Transaction coordinator will raise an exception as shown in Figure 2 (3 exception). It is then the responsibility of the application to handle this exception and decide what action to take. For example, whether to restart the transaction or to use additional exception handling.
Figure 2. Primary Node Failure on prepare phase
If the failure occurs at the commit phase, as shown in Figure 3, the Transaction coordinator will be waiting for a special message (4 ACK) from the appropriate Backup Nodes.
Figure 3. Primary Node Failure on commit phase
When the Backup Nodes detect the failure, they will notify the Transaction coordinator that they committed the transaction successfully. In this scenario, there is no data loss because the data are backed up and can still be accessed and used by applications.
After the Transaction coordinator completes the transaction, Ignite will rebalance the cluster due to the loss of the Primary Node. An election will take place to assign a new Primary Node for the partitions that were previously stored on the failed Primary Node.
Next, let's look at how Ignite manages Transaction coordinator failure.
Transaction Coordinator Failure
A worse-case scenario is if we lose the Transaction coordinator. This is because all Primary and Backup Nodes are only aware of the transaction state locally, and not of the transaction state globally. We may have some cluster nodes that received a commit message and others that did not, as shown in Figure 4.
Figure 4. Transaction Coordinator Failure
The solution to this failure scenario is for the nodes to exchange their local transaction status with each other, as shown in Figure 4. This allows them to see the global transaction status.
In this scenario, Ignite initiates a Recovery Protocol . This works as follows. All nodes participating in the transaction send messages to all other nodes participating in the transaction asking whether they received the prepare message. If any node replies that it did not receive the prepare message, the transaction will be rolled-back, otherwise the transaction will be committed. However, some of the nodes may have already committed the transaction before receiving the Recovery Protocol message. For this type of situation, all nodes keep completed transaction ID information for a period of time. If no ongoing transaction is found with a given ID, the backlog is checked. If the backlog does not contain the transaction, then the transaction was never started. Therefore, the failure occurred before the prepare phase was completed, so the transaction can be rolled-back. The Recovery Protocol also works if any of the Primary or Backup Nodes also crashed with the Transaction coordinator.
Various types of cluster failures can occur at a number of different phases. Through examples, we have seen how Ignite can gracefully manage these failures and provide recovery.