GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

Full and Incremental Snapshots

Overview

GridGain provides the ability to create snapshots of data stored cluster-wide, that can later be used for cluster recovery purposes. Snapshots taken from one cluster can also be applied on the second cluster. Snapshotting capability is tightly coupled with Ignite Native Persistence and can be used only when the latter is active.

Essentially, GridGain snapshots are similar to RDBMS backups. The main reason why GridGain snapshots are not called as backups is to avoid the confusion with GridGain backup copies of data stored in the cluster.

It’s feasible to create full offline and online snapshots as well as incremental ones. Having snapshots at hand, they can be used to recover the cluster to a state recorded in a snapshot. Furthermore, the cluster that is planned to be recovered from a snapshot can be of different topology version (different number of data nodes) than the original one. For instance, you can create snapshots of a production cluster and use the snapshots in a testing environment with a different number of cluster nodes.

Enabling Snapshots

Snapshot capabilities are based on and implemented with the usage of Ignite Native Persistence and overall Durable Memory architecture.

To enable the snapshots component, Native Persistence has to be activated first, as shown below:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">

   <!-- Enabling the Ignite Native Persistence. -->
  <property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
        <property name="defaultDataRegionConfiguration">
        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
          <property name="persistenceEnabled" value="true"/>
        </bean>
      </property>
    </bean>
  </property>

  <!-- Enabling the snapshots. -->
  <property name="pluginConfigurations">
    <bean class="org.gridgain.grid.configuration.GridGainConfiguration">
      <property name="snapshotConfiguration">
        <bean class="org.gridgain.grid.configuration.SnapshotConfiguration"/>
      </property>
    </bean>
  </property>
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();

//Enabling the Persistent Store.
DataStorageConfiguration dataStorageCfg = new DataStorageConfiguration();
dataStorageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
cfg.setDataStorageConfiguration(dataStorageCfg);

GridGainConfiguration ggCfg = new GridGainConfiguration();

SnapshotConfiguration snapshotCfg = new SnapshotConfiguration();

//Enabling the snapshots.
ggCfg.setSnapshotConfiguration(snapshotCfg);

cfg.setPluginConfigurations(ggCfg);
    var cfg = new IgniteConfiguration
    {
// Enabling the Persistent Store.
DataStorageConfiguration = new DataStorageConfiguration
{
    DefaultDataRegionConfiguration = new DataRegionConfiguration
    {
Name = "Default_Region",
PersistenceEnabled = true
    }
},

// Enabling the snapshots.
PluginConfigurations = new[]
{
    new GridGainPluginConfiguration()
    {
       SnapshotConfiguration = new SnapshotConfiguration()
    }
}
    };

By default, snapshots are stored on a local file system of cluster nodes under IGNITE_HOME\work\snapshot folder. To change the snapshots location, use the SnapshotConfiguration.setSnapshotsPath(…​) method as shown below:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">

   <!-- Enabling the Ignite Native Persistence. -->
  <property name="dataStorageConfiguration">
    <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
        <property name="defaultDataRegionConfiguration">
        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
          <property name="persistenceEnabled" value="true"/>
        </bean>
      </property>
    </bean>
  </property>

  <!-- Enabling the snapshots. -->
  <property name="pluginConfigurations">
    <bean class="org.gridgain.grid.configuration.GridGainConfiguration">
      <property name="snapshotConfiguration">
        <bean class="org.gridgain.grid.configuration.SnapshotConfiguration">
          <property name="snapshotsPath" value="/etc/snapshots/"/>
        </bean>
      </property>
    </bean>
  </property>
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();

//Enabling the Persistent Store.
DataStorageConfiguration dataStorageCfg = new DataStorageConfiguration();
dataStorageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
cfg.setDataStorageConfiguration(dataStorageCfg);

GridGainConfiguration ggCfg = new GridGainConfiguration();

SnapshotConfiguration snapshotCfg = new SnapshotConfiguration();

// Changing default path
snapshotCfg.setSnapshotsPath("/local/snasphot/store/path");
//Enabling the snapshots.
ggCfg.setSnapshotConfiguration(snapshotCfg);

cfg.setPluginConfigurations(ggCfg);
    var cfg = new IgniteConfiguration
    {
// Enabling the Persistent Store.
DataStorageConfiguration = new DataStorageConfiguration
{
    DefaultDataRegionConfiguration = new DataRegionConfiguration
    {
Name = "Default_Region",
PersistenceEnabled = true
    }
},

// Enabling the snapshots.
PluginConfigurations = new[]
{
    new GridGainPluginConfiguration()
    {
       SnapshotConfiguration = new SnapshotConfiguration()
       {
   // Changing default path.
   SnapshotsPath = "/local/snasphot/store/path"
       }
    }
}
    };

Snapshots APIs and Tools

Once you’ve enabled snapshotting, use one of the following approaches for snapshot creation, management, and recovery:

  • Java API - Snapshots related API is provided through the GridSnapshot interface. To see how to use this API in practice, refer to org.gridgain.examples.snapshots.SnapshotsExample, included with the GridGain Ultimate Edition examples.

  • Snapshots Management Tool

  • Web Console Snapshots Management

Full Snapshots

A full snapshot holds a full copy of the cluster data at a given point of time. By default, data of all the caches will be stored in a snapshot. However, GridGain provides the option to make a full snapshot including the content of specific caches only.

Creating Full Snapshots

To create a full snapshot programmatically, use the GridDatabase.createFullSnapshot(…​) method:

Full Snapshot

// Get a reference to GridGain plugin.
GridGain gg = ignite.plugin(GridGain.PLUGIN_NAME);

// Get a reference to the Snapshots.
GridSnapshot storage = gg.snapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
SnapshotFuture snapshotFut = storage.createFullSnapshot(null,
    "Snapshot has been created!");

// Wait while the snapshot is being created.
snapshotFut.get();
var ignite = Ignition.Start(cfg);

// Get a reference to grid snapshot API.
var snap = ignite.GetSnapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
var task = snap.CreateFullSnapshotAsync(null, "Snapshot has been created!");

// Wait while the snapshot is being created.
task.Task.Wait();

Full Snapshot for Specific Caches

// Get a reference to GridGain plugin.

IgniteCache<Integer, String> orgCache = ignite.getOrCreateCache("organization");
orgCache.put(1, "first_org");

GridGain gg = ignite.plugin(GridGain.PLUGIN_NAME);

// Get a reference to the Snapshots.
GridSnapshot storage = gg.snapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
SnapshotFuture snapshotFut = storage.createFullSnapshot(Collections.singleton("organization"),
    "Snapshot has been created!");

// Wait while the snapshot is being created.
snapshotFut.get();
var ignite = Ignition.Start(cfg);

var orgCache = ignite.GetOrCreateCache<int, string>("organization");
orgCache.Put(1, "first_org");

// Get a reference to grid snapshot API.
var snap = ignite.GetSnapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
var task = snap.CreateFullSnapshotAsync(new[] { orgCache.Name }, "Snapshot has been created!");

// Wait while the snapshot is being created.
task.Task.Wait();

Once this method is called on one of the cluster nodes, GridGain will take the cluster-wide snapshot asking every individual node to provide its part of the overall snapshot’s data set.

Alternatively, the Snapshots Management Tool can be used for the same purpose. At the implementation layer, these tools use the same Java API shown above.

For instance, to create a full snapshot with Snapshots Management Tool, the following command needs to be sent to the cluster.

Full Snapshot

{gridgain}/bin/snapshot-utility.sh snapshot -type=full

Full Snapshot for Specific Caches

{gridgain}/bin/snapshot-utility.sh snapshot -type=full -caches=organization

Restoring From Full Snapshot

To restore the content of all or specific caches from a previously created snapshot, the snapshot has to be available to the GridSnapshot interface.

The default implementation of this SPI assumes that the content of the snapshot will be spread out across the cluster and every node will hold a part of the snapshot that includes content for partitions for which the node is primary. Removing nodes from the cluster can break snapshots. If you plan on removing nodes from your cluster, first create a network backup.

If the snapshot is going to be restored in a cluster with the same topology version (same number of nodes and same partitions distribution) from the time the snapshot was taken, then you can use the existing APIs and tools. Restoring to a larger cluster (if you have added nodes to the existing cluster) is also supported.

If the number of nodes or the partitions distribution varies in the cluster where the snapshot is to be applied, run the CHECK command before restoring.

If there is a problem, the CHECK command shows the issue. For example:

snapshot-utility.sh check id=1565692737236
Command [CHECK -id=1565692737236 -caches=cache1,corruptedCache] started at [2019-08-13 13:38:58]...
Snapshot ID 1565692737236 is broken. Found 1 issues:
 Cache: corruptedCache. Found 1 issues:
   Partition ID: 1. Issue: Partition was not found!
Command [CHECK] failed with error: 6510 - snapshot is broken.
GridGain Snapshots utility [ver. 2.7.127-SNAPSHOT#20190813-sha1:DEV]
2019 Copyright(C) GridGain Systems

Example of a valid snapshot:

snapshot-utility.sh check id=1565692737236
Command [CHECK -id=1565692737236] started at [2019-08-13 13:38:58]...
Snapshot ID 1565692737236 is valid
Command [CHECK] successfully finished in 0 seconds.
GridGain Snapshots utility [ver. 2.7.127-SNAPSHOT#20190813-sha1:DEV]
2019 Copyright(C) GridGain Systems

As soon as the content of the snapshot is spread out properly across the cluster, the cluster can be restored from the snapshot programmatically via GridDatabase.restoreSnapshot(…​).

Restoring All Caches

# Getting a list of all the available snapshots.
# The output will include snapshots' IDs.
{gridgain}/bin/snapshot-utility.sh list

# Restoring the cluster to a specific snapshot passing its ID.
{gridgain}/bin/snapshot-utility.sh restore -id=1483663276482

Restoring Specific Caches

# Getting a list of all the available snapshots.
# The output will include snapshots' IDs.
{gridgain}/bin/snapshot-utility.sh list

# Restoring the cluster to a specific snapshot passing its ID.
{gridgain}/bin/snapshot-utility.sh restore -id=1483663276482 -caches=organization

Incremental Snapshots

In incremental snapshots, successive copies of the data contain only that portion that has changed since the preceding full or incremental snapshot copy. When a full recovery is needed, the restoration process needs the last full snapshot, plus all the incremental snapshots up to the point of restoration.

Creating Incremental Snapshots

To create an incremental snapshot programmatically, use GridDatabase.createSnapshot(…​):

Incremental Snapshot

// Get a reference to GridGain plugin.
GridGain gg = ignite.plugin(GridGain.PLUGIN_NAME);

// Get a reference to the Snapshots.
GridSnapshot storage = gg.snapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
SnapshotFuture snapshotFut = storage.createSnapshot(null,
    "Snapshot has been created!");

// Wait while the snapshot is being created.
snapshotFut.get();
var ignite = Ignition.Start(cfg);

// Get a reference to grid snapshot API.
var snap = ignite.GetSnapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
var task = snap.CreateSnapshotAsync(null, "Snapshot has been created!");

// Wait while the snapshot is being created.
task.Task.Wait();

Incremental Snapshot for Specific Caches

// Get a reference to GridGain plugin.

IgniteCache<Integer, String> orgCache = ignite.getOrCreateCache("organization");
orgCache.put(1, "first_org");

GridGain gg = ignite.plugin(GridGain.PLUGIN_NAME);

// Get a reference to the Snapshots.
GridSnapshot storage = gg.snapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
SnapshotFuture snapshotFut = storage.createSnapshot(Collections.singleton("organization"),
    "Snapshot has been created!");

// Wait while the snapshot is being created.
snapshotFut.get();
var ignite = Ignition.Start(cfg);

var orgCache = ignite.GetOrCreateCache<int, string>("organization");
orgCache.Put(1, "first_org");

// Get a reference to grid snapshot API.
var snap = ignite.GetSnapshot();

// Create the snapshot. Data of all the caches will be added to the snapshot.
var task = snap.CreateSnapshotAsync(new[] { orgCache.Name }, "Snapshot has been created!");

// Wait while the snapshot is being created.
task.Task.Wait();

Alternatively, Snapshots Management Tool can be used for the same purpose.

For instance, to create an incremental snapshot via the Snapshot Management Tool, send the following command to the cluster:

Incremental Snapshot

{gridgain}/bin/snapshot-utility.sh snapshot

Incremental Snapshot for Specific Caches

{gridgain}/bin/snapshot-utility.sh snapshot -caches=organization

Restoring From Incremental Snapshot

The same API methods and commands described for full snapshots above are used to restore a cluster from an incremental snapshot. The only extra requirement is that you need to provide both the last full snapshot plus all the incremental snapshots up to the point of restoration to the snapshotting API or tool.

Snapshots Creation Flow

This section provides low-level details regarding how snapshot creation is handled cluster-wide.

Full Snapshots Creation

  • The originating node — the node where a snapshot creation is triggered — starts the partition exchange process, which waits for all ongoing transactions to finish and blocks all new transactions.

  • When all ongoing transactions are finished, each node starts a special kind of checkpoint which is known as a "snapshot checkpoint". It involves all the actions needed for the regular checkpointing process (acquiring the checkpointing write lock, preparing the collection of dirty pages, etc.) plus it captures the total number of updated and newly allocated pages for each partition.

  • After the collection of dirty pages (updated or newly allocated but not yet synched to disk) is acquired on all the nodes, the partition exchange process is completed and new pending transactions are unblocked and allowed to execute.

  • A new snapshot session is started using DatabaseSnapshotSpi.

  • During this snapshot checkpointing session, all the dirty page buffers are written both to disk and to the snapshot session.

  • After the snapshot checkpointing is finished, the snapshot worker is activated. It copies the rest of the pages that were synched to disk earlier to the snapshot session.

  • If a new checkpoint process begins before the snapshot creation has been finished, the previous page’s content is forcibly propagated to the snapshot session by the snapshot worker out-of-order before a new dirty page is written to disk.

After the full snapshot creation is initiated, each next page modification is also tracked in special tracking pages used for incremental snapshots.

Incremental Snapshots Creation

The procedure of incremental snapshot creation is similar to the full snapshot creation process described above with the exception that only pages changed since the last full or incremental snapshot are written to the new incremental snapshot.

Snapshots Restoration Flow

This section provides low-level details regarding how snapshot restoration is handled cluster-wide.

Regardless of whether a full or incremental snapshot is restored, the grid first determines the partition distribution across all nodes. After each node determines the set of local partitions, each node fetches the history of incremental snapshots starting with the requested snapshot back to the last full snapshot in the cluster.

Then, given the number of pages in the snapshot for each partition, incremental and full backups are merged to the target working directory to assemble a consistent snapshot state. Pages from the last incremental snapshot are copied first to avoid unnecessary copies of the pages that changed multiple times.

Once each node has completed the assembly process, the cache being restored is created cluster-wide.

Snapshot Security

Snapshot Security helps ensure the integrity of each snapshot file and the overall snapshot process.

When enabled, Snapshot Security provides an additional layer of security, protecting data from modification and corruption and from snapshot file version mismatch. It helps eliminate the possibility of accidentally using files from multiple versions of snapshots. For example, suppose you have 2 snapshots: A and B. It may be possible to accidentally use file1.dat and file 2.dat from Snapshot A, and file3.dat from snapshot B by mistake.

Snapshot Security creates a file registry that contains a list of all of the files included in the snapshot, and their cryptographic hashes. Before being restored, snapshots are compared against this registry to ensure they contain the proper files.

Snapshot Security protects files from substitution; however, it does not protect against malicious actions by hackers/etc.

Enable Snapshot Security via the GG_SNAPSHOT_SECURITY_LEVEL system property.

Parameter Description

DISABLED

Snapshot file registry is not created and never validated if present (default).

IGNORE_EXISTING

Snapshot file registry is included in new snapshots. Existing file registries in snapshots are not verified.

IGNORE_MISSING

Snapshot file registry is included in new snapshots.

Existing file registries in snapshots DO GET verified.

Missing file registries in snapshots are tolerated.

REQUIRE

Snapshot file registry is included in new snapshots. Presence and validity of file registries in all snapshots are required.

Example

Refer to org.gridgain.examples.snapshots.SnapshotsExample delivered as part of the GridGain Ultimate Edition to see how to work with snapshots.