Pluggable Fault Tolerance
Failover management and resulting fault tolerance is a key property of any grid computing infrastructure. Based on its SPI-based architecture GridGain provides totally pluggable failover logic with several popular implementations available out-of-the-box. Unlike other grid computing frameworks GridGain allows to failover the logic and not only the data.
With grid task being the atomic unit of execution on the grid the fully customizable and pluggable failover logic enables developer to choose specific policy much the same way as one would choose concurrency policy in RDBMS transactions.
Moreover, GridGain allows to customize the failover logic for all tasks, for group of tasks or even for every individual task. Using meta-programming techniques the developer can even customize the failover logic for each task execution.
This allows to fine tune how grid task reacts to the failure, for example:
- Fail entire task immediately upon failure of any of its jobs (fail-fast approach)
- Failover failed job to other nodes until topology is exhausted (fail-slow approach)