GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

Control Script

GridGain provides a command line script — control.sh|bat — that you can use to monitor and control a cluster’s state​. It can be found under the /bin/ folder of the installation directory.

The control script syntax is as follows:

control.sh <connection parameters> <command> <arguments>

Connecting to Cluster

To connect to a specific node, specify the connection parameters:

Parameter Description Default Value

--host HOST_OR_IP

The host name or IP address of the node.

localhost

--port PORT

The port to connect to.

11211

--user USER

The user name.

--password PASSWORD

The user password.

--ping-interval PING_INTERVAL

The ping interval.

5000

--ping-timeout PING_TIMEOUT

Ping response timeout.

30000

--ssl-protocol PROTOCOL1, PROTOCOL2…​

A list of SSL protocols to try when connecting to the cluster. Supported protocols.

TLS

--ssl-cipher-suites CIPHER1,CIPHER2…​

A list of SSL ciphers. Supported ciphers.

--ssl-key-algorithm ALG

The SSL key algorithm.

SunX509

--keystore-type KEYSTORE_TYPE

The keystore type.

JKS

--keystore KEYSTORE_PATH

The path to the keystore. Specify a keystore to enable SSL for the control script.

--keystore-password KEYSTORE_PWD

The password to the keystore.

--truststore-type TRUSTSTORE_TYPE

The type of the truststore.

JKS

--truststore TRUSTSTORE_PATH

The path to the truststore.

--truststore-password TRUSTSTORE_PWD

The password to the truststore.

Activation, Deactivation and Topology Management

The first thing that control.sh|bat is capable of is cluster activation/deactivation and management of a set of nodes that represent the Baseline Topology.

Activating Cluster

Activation will set the baseline topology of the cluster to the set of nodes available at the moment of activation. Activation is required only if you use native persistence.

To activate the cluster, run the following command:

control.sh --activate
control.bat --activate

Deactivating Cluster

To deactivate the cluster, run the following command:

control.sh --deactivate [--yes]
control.bat --deactivate [--yes]

Getting Cluster State

The state of the cluster refers to whether it is activated or not.

To get the state of the cluster, run the following command:

control.sh --state
control.bat --state

Getting Nodes Included in Baseline Topology

To get the list of nodes included in the baseline topology, run the following command:

control.sh --baseline
control.bat --baseline

The output will contain the current topology version, the list of consistent IDs of the nodes included in the baseline topology, and the list of nodes that joined the cluster but were not added to the baseline topology.

Command [BASELINE] started
Arguments: --baseline
--------------------------------------------------------------------------------
Cluster state: active
Current topology version: 3

Current topology version: 3 (Coordinator: ConsistentId=dd3d3959-4fd6-4dc2-8199-bee213b34ff1, Order=1)

Baseline nodes:
    ConsistentId=7d79a1b5-cbbd-4ab5-9665-e8af0454f178, State=ONLINE, Order=2
    ConsistentId=dd3d3959-4fd6-4dc2-8199-bee213b34ff1, State=ONLINE, Order=1
--------------------------------------------------------------------------------
Number of baseline nodes: 2

Other nodes:
    ConsistentId=30e16660-49f8-4225-9122-c1b684723e97, Order=3
Number of other nodes: 1
Command [BASELINE] finished with code: 0
Control utility has completed execution at: 2019-12-24T16:53:08.392865
Execution time: 333 ms

Adding Nodes to Baseline Topology

To add a node to the baseline topology, run the command given below. This will start the rebalancing process.

control.sh --baseline add consistentId1,consistentId2,... [--yes]
control.bat --baseline add consistentId1,consistentId2,... [--yes]

Removing Nodes from Baseline Topology

To remove a node from the baseline topology, use the remove command. Only offline nodes can be removed from the baseline topology: shut down the node first and then use the remove command. This operation will start the rebalancing process, which will redistribute the data across the nodes remaining in the baseline topology.

control.sh --baseline remove consistentId1,consistentId2,... [--yes]
control.bat --baseline remove consistentId1,consistentId2,... [--yes]

Setting Baseline Topology

You can set the baseline topology by either providing a list of nodes (consistent IDs) or specifying the desired version of the baseline topology.

To set a list of node as the baseline topology, use the following command:

control.sh --baseline set consistentId1,consistentId2,... [--yes]
control.bat --baseline set consistentId1,consistentId2,... [--yes]

To restore a specific version of the baseline topology, use the following command:

control.sh --baseline version topologyVersion [--yes]
control.bat --baseline version topologyVersion [--yes]

Enabling Baseline Topology Autoadjustment

Baseline topology autoadjustment refers to automatic update of baseline topology after the topology has been stable for a specific amount of time.

For in-memory clusters, autoadjustment is enabled by default with the timeout set to 0. It means that baseline topology changes immediately after server nodes join or leave the cluster. For clusters with persistence, the automatic baseline adjustment is disabled by default. To enable it, use the following command:

control.sh --baseline auto_adjust enable timeout 30000
control.bat --baseline auto_adjust enable timeout 30000

The timeout is set in milliseconds. The baseline will be set to the current topology when a given number of milliseconds has passed after the last JOIN/LEFT/FAIL event. Every new JOIN/LEFT/FAIL event restarts the timeout countdown.

To disable baseline autoadjustment, use the following command:

control.sh --baseline auto_adjust disable
control.bat --baseline auto_adjust disable

Transaction Management

The control script allows you to get the information about the transactions being executed in the cluster. You can also cancel specific transactions.

The following command returns a list of transactions that satisfy a given filter (or all transactions if no filter is provided):

control.sh --tx <transaction filter> --info
control.bat --tx <transaction filter> --info

The transaction filter parameters are listed in the following table.

Parameter Description

--xid XID

Transaction ID.

--min-duration SECONDS

Minimum number of seconds a transaction has been executing.

--min-size SIZE

Minimum size of a transaction

--label LABEL

User label for transactions. You can use a regular expression.

--servers|--clients

Limit the scope of the operation to either server or client nodes.

--nodes nodeId1,nodeId2…​

The list of consistent IDs of the nodes you want to get transactions from.

--limit NUMBER

Limit the number of transactions to the given value.

--order DURATION|SIZE|START_TIME

The parameter that will be used to sort the output.

To cancel transactions, use the following command:

control.sh --tx <transaction filter> --kill
control.bat --tx <transaction filter> --kill

For example, to cancel the transactions that have been running for more than 100 seconds, execute the following command:

control.sh --tx --min-duration 100 --kill

Cache State Monitoring

One of the most important commands that control.sh|bat provides is --cache list, which is used for cache monitoring. The command provides a list of deployed caches with their affinity parameters and distribution within cache groups. There is also a command for viewing existing atomic sequences.

# Displays a list of all caches
control.sh|bat --cache list .

# Displays a list of caches whose names start with "account-".
control.sh|bat --cache list account-.*

# Displays info about cache group distribution for all caches.
control.sh|bat --cache list . --groups

# Displays info about cache group distribution for the caches whose names start with "account-".
control.sh|bat --cache list account-.* --groups

# Displays info about all atomic sequences.
control.sh|bat --cache list . --seq

# Displays info about the atomic sequnces whose names start with "counter-".
control.sh|bat --cache list counter-.* --seq

Contention Detection in Transactions

The contention command detects when multiple transactions are in contention to create a lock for the same key. The command is useful if you have long-running or hanging transactions. Example:

# Reports all keys that are point of contention for at least 5 transactions on all cluster nodes.
control.sh|bat --cache contention 5

# Reports all keys that are point of contention for at least 5 transactions on specific server node.
control.sh|bat --cache contention 5 f2ea-5f56-11e8-9c2d-fa7a

If there are any highly contended keys, the utility will dump extensive information including the keys, transactions, and nodes where the contention took place. Example:

[node=TcpDiscoveryNode [id=d9620450-eefa-4ab6-a821-644098f00001, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]

// No contention on node d9620450-eefa-4ab6-a821-644098f00001.

[node=TcpDiscoveryNode [id=03379796-df31-4dbd-80e5-09cef5000000, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1527169443913, loc=false, ver=2.5.0#20180518-sha1:02c9b2de, isClient=false]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=CREATE, val=UserCacheObjectImpl [val=0, hasValBytes=false], tx=GridNearTxLocal[xid=e9754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439646, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1247], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=8a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439656, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=6a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439654, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=7a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439655, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]
    TxEntry [cacheId=1544803905, key=KeyCacheObjectImpl [part=0, val=0, hasValBytes=false], queue=10, op=READ, val=null, tx=GridNearTxLocal[xid=4a754629361-00000000-0843-9f61-0000-000000000001, xidVersion=GridCacheVersion [topVer=138649441, order=1527169439652, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=ACTIVE, invalidate=false, rollbackOnly=false, nodeId=03379796-df31-4dbd-80e5-09cef5000000, timeout=0, duration=1175], other=[]]

// Node 03379796-df31-4dbd-80e5-09cef5000000 is place for contention on key KeyCacheObjectImpl [part=0, val=0, hasValBytes=false].

Consistency Check Commands

control.sh|bat includes a set of consistency check commands that help with verifying internal data consistency invariants.

First, the commands can be used for debugging and troubleshooting purposes especially if you’re in active development.

Second, if there is a suspicion that a query (such as a SQL query, etc.) returns an incomplete or wrong result set, the commands can verify whether the data inconsistency is actually occurring.

Finally, the consistency check commands can be utilized as part of regular cluster health monitoring.

Let’s review these usage scenarios in more detail.

Verification of Partition Checksums

Even if update counters and size are equal among primary and backup nodes, there might be a case when a primary and backup diverge due to some critical failure. The idle_verify command in control.sh|bat calculates and compares partition hashes among the whole cluster and reports if they are different. It’s possible to specify a list of caches that require verification, as follows:​

# Checks partitions of all caches that their partitions actually contain same data.
control.sh|bat --cache idle_verify

# Checks partitions of specific caches that their partitions actually contain same data.
control.sh|bat --cache idle_verify cache1,cache2,cache3

If any partitions diverge, a list of conflict partitions will be printed out, as follows:

idle_verify check has finished, found 2 conflict partitions.

Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=5]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97506054, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=65957380, updateCntr=3, size=2, consistentId=bltTest0]]
Conflict partition: PartitionKey [grpId=1544803905, grpName=default, partId=6]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97595430, updateCntr=3, size=3, consistentId=bltTest1], PartitionHashRecord [isPrimary=false, partHash=66016964, updateCntr=3, size=2, consistentId=bltTest0]]

SQL Indexes Consistency Validation

The validate_indexes command allows validating​ indexes of given caches on all cluster nodes locally.

The following is checked by the validation process:

  1. All the key-value entries that are referenced from a primary index has to be reachable from secondary SQL indexes as well if any.

  2. All the key-value entries that are referenced from a primary index has to be reachable. A reference from the primary index shouldn’t point to nowhere.

  3. All the key-value entries that are referenced from secondary SQL indexes have to be reachable from the primary index as well.

# Checks indexes of all caches on all cluster nodes.
control.sh|bat --cache validate_indexes

# Checks indexes of specific caches on all cluster nodes.
control.sh|bat --cache validate_indexes cache1,cache2

# Checks indexes of specific caches on node with given node ID.
control.sh|bat --cache validate_indexes cache1,cache2 f2ea-5f56-11e8-9c2d-fa7a

If indexes refer to non-existing entries (or some entries are not indexed), errors will be dumped to the output, as follows:

PartitionKey [grpId=-528791027, grpName=persons-cache-vi, partId=0] ValidateIndexesPartitionResult [updateCntr=313, size=313, isPrimary=true, consistentId=bltTest0]
IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=_key_PK], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
IndexValidationIssue [key=0, cacheName=persons-cache-vi, idxName=PERSON_ORGID_ASC_IDX], class org.apache.ignite.IgniteCheckedException: Key is present in CacheDataTree, but can't be found in SQL index.
validate_indexes has finished with errors (listed above).