GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

Control Script

Apache Ignite provides a command line script — control.sh|bat — that you can use to monitor and control a cluster’s state​. It can be found under /bin/ folder of an Apache Ignite installation directory.

Activation, Deactivation and Topology Management

The first thing that control.sh|bat is capable of is cluster activation/deactivation and management of a set of nodes that represent the Baseline Topology documentation.

Cache State Monitoring

One of the most important commands that control.sh|bat provides is --cache list which is used for cache monitoring. The command provides a list of deployed caches with their affinity parameters and distribution within cache groups. There is also a command for viewing existing atomic sequences.

# Displays list of all caches with affinity parameters.
control.sh|bat --cache list .*

# Displays list of caches with affinity parameters which names start with "account-".
control.sh|bat --cache list account-.*

# Displays info about cache groups distribution of all caches.
control.sh|bat --cache list .* groups

# Displays info about cache groups distribution of caches which names start with "account-".
control.sh|bat --cache list account-.* groups

# Displays info about all atomic sequences.
control.sh|bat --cache list .* seq

# Displays info about atomic sequnces which names start with "counter-".
control.sh|bat --cache list counter-.*

Contention Detection in Transactions

The contention command detects when multiple transactions are in contention to create a lock for the same key. The command is useful if you have long-running or hanging transactions. Example:

# Reports all keys that are point of contention for at least 5 transactions on all cluster nodes.
control.sh|bat --cache contention 5

# Reports all keys that are point of contention for at least 5 transactions on specific server node.
control.sh|bat --cache contention 5 f2ea-5f56-11e8-9c2d-fa7a

If there are any highly contended keys, the utility will dump extensive information including the keys, transactions, and nodes where the contention took place. Example:

[node=TcpDiscoveryNode [id=d9620450-eefa-4ab6-a821-644098f00001, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]

// No contention on node d9620450-eefa-4ab6-a821-644098f00001.

[node=TcpDiscoveryNode [id=03379796-df31-4dbd-80e5-09cef5000000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=CREATE, val=UserCacheObjectImpl [val=0, hasValBytes=false], tx=GridNearTxLocal[xid=e9754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439646, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1247], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=8a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439656, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=6a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439654, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=7a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439655, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=4a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439652, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]

// Node 03379796-df31-4dbd-80e5-09cef5000000 is place for contention on key KeyCacheObjectImpl [part=0, val=0, hasValBytes=false].

Consistency Check Commands

control.sh|bat includes a set of consistency check commands that help with verifying internal data consistency invariants.

First, the commands can be used for debugging and troubleshooting purposes especially if you’re in active development.

Second, if there is a suspicion that a query (such as a SQL query, etc.) returns an incomplete or wrong result set, the commands can verify whether the data inconsistency is actually occurring.

Finally, the consistency check commands can be utilized as part of regular cluster health monitoring.

Let’s review these usage scenarios in more detail.

Verification of Partition Checksums

Even if update counters and size are equal among primary and backup nodes, there might be a case when a primary and backup diverge due to some critical failure. The idle_verify command in control.sh|bat calculates and compares partition hashes among the whole cluster and reports if they are different. It’s possible to specify a list of caches that require verification, as follows:​

# Checks partitions of all caches that their partitions actually contain same data.
control.sh|bat --cache idle_verify

# Checks partitions of specific caches that their partitions actually contain same data.
control.sh|bat --cache idle_verify cache1,cache2,cache3

If any partitions diverge, a list of conflict partitions will be printed out, as follows:

idle_verify check has finished, found 2 conflict partitions.

Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=5]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97506054, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=65957380, updateCntr=3, size=2, consistentId=bltTest0]]
Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=6]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97595430, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=66016964, updateCntr=3, size=2, consistentId=bltTest0]]

SQL Indexes Consistency Validation

The validate_indexes command allows validating​ indexes of given caches on all cluster nodes locally.

The following is checked by the validation process:

  1. All the key-value entries that are referenced from a primary index has to be reachable from secondary SQL indexes as well if any.

  2. All the key-value entries that are referenced from a primary index has to be reachable. A reference from the primary index shouldn’t go in nowhere.

  3. All the key-value entries that are referenced from secondary SQL indexes have to be reachable from the primary index as well.

# Checks indexes of all caches on all cluster nodes.
control.sh|bat --cache validate_indexes

# Checks indexes of specific caches on all cluster nodes.
control.sh|bat --cache validate_indexes cache1,cache2

# Checks indexes of specific caches on node with given node ID.
control.sh|bat --cache validate_indexes cache1,cache2 f2ea-5f56-11e8-9c2d-fa7a

If indexes refer to non-existing entries (or some entries are not indexed), errors will be dumped to the output, as follows:

PartitionKey [grpId=-528791027, grpName=persons-cache-vi, partId=0] ValidateIndexesPartitionResult [updateCntr=313, size=313, isPrimary=true, consistentId=bltTest0]
IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=PERSON_ORGID_ASC_IDX], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
validate_indexes has finished with errors (listed above).