GridGain Developers Hub

Monitoring Specific Metrics

This page describes how to set up Dashboard widgets to monitor some of the useful metrics. The list of metrics that you most likely want to monitor is available in the Metrics section of the GridGain documentation. The list is not exhaustive, and you may want to monitor other metrics, depending on your use case.

RAM Usage

Heap Memory

The amount of heap memory available on the node.

Metric Name:

Scope:

Node

Metric:

Heap Used

heap used metric

Off-Heap Memory

The amount of off-heap memory available per data region. If you have multiple data regions, you should monitor each region.

The following metric reports the amount of RAM occupied by the default data region:

Metric Name:

Scope:

Node

Metric:

default Physical Memory Size

If you have custom data regions, add metrics for each of the custom data regions.

Metric Name:

Scope:

Node

Metric:

<data_region> Physical Memory Size

Data Storage Size

Data storage includes application data (including indexes) and WAL files (including the WAL archive).

Persistent Storage Size

If your cluster uses persistent storage, you can monitor the size of the storage, so you can avoid running out of disk space. The value does not include the size of the WAL or the size of the WAL Archive.

Metric Name:

Scope:

Node

Metric:

Storage Size

Storage Size

WAL Size

The following metric returns the size of the WAL data (including the size of the WAL archive):

Metric Name:

Scope:

Node

Metric:

WAL Total Size

Wal Size

Rebalancing Progress

There is a specific widget that indicates the progress of the rebalancing process. See Rebalance Widget.

Checkpoint Duration

Lengthy checkpoint operations can slow down other operations in the cluster. If you observe an increase in the duration of checkpoint creation, try tweaking checkpoint parameters. See Pages Writes Throttling and Checkpointing Buffer Size for performance tips.

You can use the following metric to monitor the duration of the last checkpoint operation:

Metric Name:

Scope:

Node

Metric:

Last Checkpoint Duration

Storage Size

Monitoring Transactions

The following metrics provide statistics about transactions. See Monitoring Transactions for details.

Scope Metric Description

Node

Locked Keys Number

The number of keys locked on the node.

Node

TX TX Commits

The number of transactions that were committed on the node.

Node

TX TX Rollbacks

The number of transactions that were rolled back.

Node

Owner Transactions Number

The number of transactions initiated on the node.

Node

Transactions Holding Lock Number

The number of open transactions that hold a lock on at least one key on the node.

Transaction Metrics

Communication Message Queue

If the size of the communication queue is increasing, there might be communication problems.

Metric Name:

Scope:

Node

Metric:

Outbound Messages Queue Size