Generic Metrics
This page describes generic GridGain metrics.
Metric Registers
Metrics are grouped into categories called registers. Each register has a name. The full name of a specific metric within the register consists of the register name followed by a dot, followed by the name of the metric: <register_name>.<metric_name>
.
For example, the register for data storage metrics is called io.datastorage
. The metric that return the storage size is called io.datastorage.StorageSize
.
Metric Exporters
If you want to enable metrics, configure one or multiple metric exporters in the node configuration. This is a node-specific configuration, which means it enables metrics only on the node where it is specified.
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="metricExporterSpi">
<list>
<bean class="org.apache.ignite.spi.metric.jmx.JmxMetricExporterSpi"/>
<bean class="org.apache.ignite.spi.metric.log.LogExporterSpi"/>
<bean class="org.apache.ignite.spi.metric.opencensus.OpenCensusMetricExporterSpi"/>
</list>
</property>
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setMetricExporterSpi(new JmxMetricExporterSpi());
Ignite ignite = Ignition.start(cfg);
This API is not presently available for C++. You can use XML configuration.
The following sections describe the exporters available in Ignite by default.
JMX
org.apache.ignite.spi.metric.jmx.JmxMetricExporterSpi
exposes metrics via JMX beans.
IgniteConfiguration cfg = new IgniteConfiguration();
JmxMetricExporterSpi jmxExporter = new JmxMetricExporterSpi();
//export cache metrics only
jmxExporter.setExportFilter(mreg -> mreg.name().startsWith("cache."));
cfg.setMetricExporterSpi(jmxExporter);
This API is not presently available for C++.
SQL View
SqlViewMetricExporterSpi
is enabled by default, SqlViewMetricExporterSpi
exposes metrics via the SYS.METRICS
view.
Each metric is displayed as a single record.
You can use any supported SQL tool to view the metrics:
> select name, value from SYS.METRICS where name LIKE 'cache.myCache.%';
+-----------------------------------+--------------------------------+
| NAME | VALUE |
+-----------------------------------+--------------------------------+
| cache.myCache.CacheTxRollbacks | 0 |
| cache.myCache.OffHeapRemovals | 0 |
| cache.myCache.QueryCompleted | 0 |
| cache.myCache.QueryFailed | 0 |
| cache.myCache.EstimatedRebalancingKeys | 0 |
| cache.myCache.CacheEvictions | 0 |
| cache.myCache.CommitTime | [J@2eb66498 |
....
Log
org.apache.ignite.spi.metric.log.LogExporterSpi
prints the metrics to the log file at regular intervals (1 min by default) at INFO level.
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:util="http://www.springframework.org/schema/util" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="metricExporterSpi">
<list>
<bean class="org.apache.ignite.spi.metric.log.LogExporterSpi"/>
</list>
</property>
</bean>
</beans>
If you use programmatic configuration, you can change the print frequency as follows:
IgniteConfiguration cfg = new IgniteConfiguration();
LogExporterSpi logExporter = new LogExporterSpi();
logExporter.setPeriod(600_000);
//export cache metrics only
logExporter.setExportFilter(mreg -> mreg.name().startsWith("cache."));
cfg.setMetricExporterSpi(logExporter);
Ignite ignite = Ignition.start(cfg);
OpenCensus
org.apache.ignite.spi.metric.opencensus.OpenCensusMetricExporterSpi
adds integration with the OpenCensus library.
To use the OpenCensus exporter:
-
Add
org.apache.ignite.spi.metric.opencensus.OpenCensusMetricExporterSpi
to the list of exporters in the node configuration. -
Configure OpenCensus StatsCollector to export to a specific system. See OpenCensusMetricsExporterExample.java for an example and OpenCensus documentation for additional information.
Configuration parameters:
-
filter
- predicate that filters metrics. -
period
- export period. -
sendInstanceName
- if enabled, a tag with the Ignite instance name is added to each metric. -
sendNodeId
- if enabled, a tag with the Ignite node id is added to each metric. -
sendConsistentId
- if enabled, a tag with the Ignite node consistent id is added to each metric.
Available Metrics
System
System metrics such as JVM or CPU metrics.
Register name: sys
Name | Type | Description |
---|---|---|
CpuLoad |
double |
CPU load. |
CurrentThreadCpuTime |
long |
ThreadMXBean.getCurrentThreadCpuTime() |
CurrentThreadUserTime |
long |
ThreadMXBean.getCurrentThreadUserTime() |
DaemonThreadCount |
integer |
ThreadMXBean.getDaemonThreadCount() |
GcCpuLoad |
double |
GC CPU load. |
PeakThreadCount |
integer |
ThreadMXBean.getPeakThreadCount |
SystemLoadAverage |
java.lang.Double |
OperatingSystemMXBean.getSystemLoadAverage() |
ThreadCount |
integer |
ThreadMXBean.getThreadCount |
TotalExecutedTasks |
long |
Total executed tasks. |
TotalStartedThreadCount |
long |
ThreadMXBean.getTotalStartedThreadCount |
UpTime |
long |
RuntimeMxBean.getUptime() |
memory.heap.committed |
long |
MemoryUsage.getHeapMemoryUsage().getCommitted() |
memory.heap.init |
long |
MemoryUsage.getHeapMemoryUsage().getInit() |
memory.heap.used |
long |
MemoryUsage.getHeapMemoryUsage().getUsed() |
memory.nonheap.committed |
long |
MemoryUsage.getNonHeapMemoryUsage().getCommitted() |
memory.nonheap.init |
long |
MemoryUsage.getNonHeapMemoryUsage().getInit() |
memory.nonheap.max |
long |
MemoryUsage.getNonHeapMemoryUsage().getMax() |
memory.nonheap.used |
long |
MemoryUsage.getNonHeapMemoryUsage().getUsed() |
Caches
Cache metrics.
Register name: cache.{cache_name}.{near}
Name | Type | Description |
---|---|---|
CacheEvictions |
long |
The total number of evictions from the cache. |
CacheGets |
long |
The total number of gets to the cache. |
CacheHits |
long |
The number of get requests that were satisfied by the cache. |
CacheMisses |
long |
A miss is a get request that is not satisfied. |
CachePuts |
long |
The total number of puts to the cache. |
CacheRemovals |
long |
The total number of removals from the cache. |
CacheTxCommits |
long |
Total number of transaction commits. |
CacheTxRollbacks |
long |
Total number of transaction rollbacks. |
CacheSize |
long |
Local cache size. |
CommitTime |
histogram |
Commit time in nanoseconds. |
CommitTimeTotal |
long |
The total time of commit, in nanoseconds. |
EntryProcessorHits |
long |
The total number of invocations on keys, which exist in cache. |
EntryProcessorInvokeTimeNanos |
long |
The total time of cache invocations for which this node is the initiator, in nanoseconds. |
EntryProcessorMaxInvocationTime |
long |
So far, the maximum time to execute cache invokes for which this node is the initiator. |
EntryProcessorMinInvocationTime |
long |
So far, the minimum time to execute cache invokes for which this node is the initiator. |
EntryProcessorMisses |
long |
The total number of invocations on keys, which don’t exist in cache. |
EntryProcessorPuts |
long |
The total number of cache invocations, caused update. |
EntryProcessorReadOnlyInvocations |
long |
The total number of cache invocations, caused no updates. |
EntryProcessorRemovals |
long |
The total number of cache invocations, caused removals. |
EstimatedRebalancingKeys |
long |
Number estimated to rebalance keys. |
GetAllTime |
histogram |
GetAll time for which this node is the initiator, in nanoseconds. |
GetTime |
histogram |
Get time for which this node is the initiator, in nanoseconds. |
GetTimeTotal |
long |
The total time of cache gets for which this node is the initiator, in nanoseconds. |
HeapEntriesCount |
long |
Onheap entries count. |
IndexRebuildKeysProcessed |
long |
The number of keys with rebuilt indexes. |
IsIndexRebuildInProgress |
boolean |
True if index build or rebuild is in progress. |
OffHeapBackupEntriesCount |
long |
Offheap backup entries count. |
OffHeapEntriesCount |
long |
Offheap entries count. |
OffHeapEvictions |
long |
The total number of evictions from the off-heap memory. |
OffHeapGets |
long |
The total number of get requests to the off-heap memory. |
OffHeapHits |
long |
The number of get requests that were satisfied by the off-heap memory. |
OffHeapMisses |
long |
A miss is a get request that is not satisfied by off-heap memory. |
OffHeapPrimaryEntriesCount |
long |
Offheap primary entries count. |
OffHeapPuts |
long |
The total number of put requests to the off-heap memory. |
OffHeapRemovals |
long |
The total number of removals from the off-heap memory. |
PutAllTime |
histogram |
PutAll time for which this node is the initiator, in nanoseconds. |
PutTime |
histogram |
Put time for which this node is the initiator, in nanoseconds. |
PutTimeTotal |
long |
The total time of cache puts for which this node is the initiator, in nanoseconds. |
QueryCompleted |
long |
Count of completed queries. |
QueryExecuted |
long |
Count of executed queries. |
QueryFailed |
long |
Count of failed queries. |
QueryMaximumTime |
long |
Maximum query execution time. |
QueryMinimalTime |
long |
Minimum query execution time. |
QuerySumTime |
long |
Query summary time. |
RebalanceClearingPartitionsLeft |
long |
Number of partitions need to be cleared before actual rebalance start. |
RebalanceStartTime |
long |
Rebalance start time. |
RebalancedKeys |
long |
Number of already rebalanced keys. |
RebalancingBytesRate |
long |
Estimated rebalancing speed in bytes. |
RebalancingKeysRate |
long |
Estimated rebalancing speed in keys. |
RemoveAllTime |
histogram |
RemoveAll time for which this node is the initiator, in nanoseconds. |
RemoveTime |
histogram |
Remove time for which this node is the initiator. in nanoseconds. |
RemoveTimeTotal |
long |
The total time of cache removal, in nanoseconds. |
RollbackTime |
histogram |
Rollback time in nanoseconds. |
RollbackTimeTotal |
long |
The total time of rollback, in nanoseconds. |
TotalRebalancedBytes |
long |
Number of already rebalanced bytes. |
Cache Groups
Register name: cacheGroups.{group_name}
Name | Type | Description |
---|---|---|
AffinityPartitionsAssignmentMap |
java.util.Map |
Affinity partitions assignment map. |
Caches |
java.util.ArrayList |
List of caches |
IndexBuildCountPartitionsLeft |
long |
Number of partitions need processed for finished indexes create or rebuilding. |
LocalNodeMovingPartitionsCount |
integer |
Count of partitions with state MOVING for this cache group located on this node. |
LocalNodeOwningPartitionsCount |
integer |
Count of partitions with state OWNING for this cache group located on this node. |
LocalNodeRentingEntriesCount |
long |
Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
LocalNodeRentingPartitionsCount |
integer |
Count of partitions with state RENTING for this cache group located on this node. |
MaximumNumberOfPartitionCopies |
integer |
Maximum number of partition copies for all partitions of this cache group. |
MinimumNumberOfPartitionCopies |
integer |
Minimum number of partition copies for all partitions of this cache group. |
MovingPartitionsAllocationMap |
java.util.Map |
Allocation map of partitions with state MOVING in the cluster. |
OwningPartitionsAllocationMap |
java.util.Map |
Allocation map of partitions with state OWNING in the cluster. |
PartitionIds |
java.util.ArrayList |
Local partition ids. |
SparseStorageSize |
long |
Storage space allocated for group adjusted for possible sparsity, in bytes. |
StorageSize |
long |
Storage space allocated for group, in bytes. |
TotalAllocatedPages |
long |
Cache group total allocated pages. |
TotalAllocatedSize |
long |
Total size of memory allocated for group, in bytes. |
Transactions
Transaction metrics.
Register name: tx
Name | Type | Description |
---|---|---|
AllOwnerTransactions |
java.util.HashMap |
Map of local node owning transactions. |
LockedKeysNumber |
long |
The number of keys locked on the node. |
OwnerTransactionsNumber |
long |
The number of active transactions for which this node is the initiator. |
TransactionsHoldingLockNumber |
long |
The number of active transactions holding at least one key lock. |
LastCommitTime |
long |
Last commit time. |
nodeSystemTimeHistogram |
histogram |
Transactions system times on node represented as histogram. |
nodeUserTimeHistogram |
histogram |
Transactions user times on node represented as histogram. |
LastRollbackTime |
long |
Last rollback time. |
totalNodeSystemTime |
long |
Total transactions system time on node. |
totalNodeUserTime |
long |
Total transactions user time on node. |
txCommits |
integer |
Number of transaction commits. |
txRollbacks |
integer |
Number of transaction rollbacks. |
Partition Map Exchange
Partition map exchange metrics.
Register name: pme
Name | Type | Description |
---|---|---|
CacheOperationsBlockedDuration |
long |
Current PME cache operations blocked duration in milliseconds. |
CacheOperationsBlockedDurationHistogram |
histogram |
Histogram of cache operations blocked PME durations in milliseconds. |
Duration |
long |
Current PME duration in milliseconds. |
DurationHistogram |
histogram |
Histogram of PME durations in milliseconds. |
Compute Jobs
Register name: compute.jobs
Name | Type | Description |
---|---|---|
compute.jobs.Active |
long |
Number of active jobs currently executing. |
compute.jobs.Canceled |
long |
Number of cancelled jobs that are still running. |
compute.jobs.ExecutionTime |
long |
Total execution time of jobs. |
compute.jobs.Finished |
long |
Number of finished jobs. |
compute.jobs.Rejected |
long |
Number of jobs rejected after more recent collision resolution operation. |
compute.jobs.Started |
long |
Number of started jobs. |
compute.jobs.Waiting |
long |
Number of currently queued jobs waiting to be executed. |
compute.jobs.WaitingTime |
long |
Total time jobs spent on waiting queue. |
Thread Pools
Register name: threadPools.{thread_pool_name}
Name | Type | Description |
---|---|---|
ActiveCount |
long |
Approximate number of threads that are actively executing tasks. |
CompletedTaskCount |
long |
Approximate total number of tasks that have completed execution. |
CorePoolSize |
long |
The core number of threads. |
KeepAliveTime |
long |
Thread keep-alive time, which is the amount of time which threads in excess of the core pool size may remain idle before being terminated. |
LargestPoolSize |
long |
Largest number of threads that have ever simultaneously been in the pool. |
MaximumPoolSize |
long |
The maximum allowed number of threads. |
PoolSize |
long |
Current number of threads in the pool. |
QueueSize |
long |
Current size of the execution queue. |
RejectedExecutionHandlerClass |
string |
Class name of current rejection handler. |
Shutdown |
boolean |
True if this executor has been shut down. |
TaskCount |
long |
Approximate total number of tasks that have been scheduled for execution. |
Terminated |
boolean |
True if all tasks have completed following shut down. |
Terminating |
long |
True if terminating but not yet terminated. |
ThreadFactoryClass |
string |
Class name of thread factory used to create new threads. |
Cache Group IO
Register name: io.statistics.cacheGroups.{group_name}
Name | Type | Description |
---|---|---|
LOGICAL_READS |
long |
Number of logical reads |
PHYSICAL_READS |
long |
Number of physical reads |
grpId |
integer |
Group id |
name |
string |
Name of the index |
startTime |
long |
Statistics collect start time |
Sorted Indexes
Register name: io.statistics.sortedIndexes.{cache_name}.{index_name}
Name | Type | Description |
---|---|---|
LOGICAL_READS_INNER |
long |
Number of logical reads for inner tree node |
LOGICAL_READS_LEAF |
long |
Number of logical reads for leaf tree node |
PHYSICAL_READS_INNER |
long |
Number of physical reads for inner tree node |
PHYSICAL_READS_LEAF |
long |
Number of physical reads for leaf tree node |
indexName |
string |
Name of the index |
name |
string |
Name of the cache |
startTime |
long |
Statistics collection start time |
Hash Indexes
Register name: io.statistics.hashIndexes.{cache_name}.{index_name}
Name | Type | Description |
---|---|---|
LOGICAL_READS_INNER |
long |
Number of logical reads for inner tree node |
LOGICAL_READS_LEAF |
long |
Number of logical reads for leaf tree node |
PHYSICAL_READS_INNER |
long |
Number of physical reads for inner tree node |
PHYSICAL_READS_LEAF |
long |
Number of physical reads for leaf tree node |
indexName |
string |
Name of the index |
name |
string |
Name of the cache |
startTime |
long |
Statistics collection start time |
Communication IO
Register name: io.communication
Name | Type | Description |
---|---|---|
ActiveSessionsCount |
integer |
Active TCP sessions count. |
OutboundMessagesQueueSize |
integer |
Outbound messages queue size. |
SentMessagesCount |
integer |
Sent messages count. |
SentBytesCount |
long |
Sent bytes count. |
ReceivedBytesCount |
long |
Received bytes count. |
ReceivedMessagesCount |
integer |
Received messages count. |
RejectedSslSessionsCount |
integer |
TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled). |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
SslHandshakeDurationHistogram |
histogram |
Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled). |
Ignite Thin Client Connector
Register name: client.connector
Name | Type | Description |
---|---|---|
ActiveSessionsCount |
integer |
Active TCP sessions count. |
ReceivedBytesCount |
long |
Received bytes count. |
RejectedSslSessionsCount |
integer |
TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled). |
RejectedSessionsTimeout |
integer |
TCP sessions count that were rejected due to handshake timeout. |
RejectedSessionsAuthenticationFailed |
integer |
TCP sessions count that were rejected due to failed authentication. |
RejectedSessionsTotal |
integer |
Total number of rejected TCP connections. |
{clientType}.AcceptedSessions |
integer |
Number of successfully established sessions for the client type. |
{clientType}.ActiveSessions |
integer |
Number of active sessions for the client type. |
SentBytesCount |
long |
Sent bytes count. |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
SslHandshakeDurationHistogram |
histogram |
Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled). |
Ignite REST Client Connector
Register name: rest.client
Name | Type | Description |
---|---|---|
ActiveSessionsCount |
integer |
Active TCP sessions count. |
ReceivedBytesCount |
long |
Received bytes count. |
RejectedSslSessionsCount |
integer |
TCP sessions count that were rejected due to the SSL errors (metric is exported only if SSL is enabled). |
SentBytesCount |
long |
Sent bytes count. |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
SslHandshakeDurationHistogram |
histogram |
Histogram of SSL handshake duration in milliseconds (metric is exported only if SSL is enabled). |
Discovery IO
Register name: io.discovery
Name | Type | Description |
---|---|---|
CoordinatorSince |
long |
Timestamp since which the local node became the coordinator (metric is exported only from server nodes). |
Coordinator |
UUID |
Coordinator ID (metric is exported only from server nodes). |
CurrentTopologyVersion |
long |
Current topology version. |
JoinedNodes |
integer |
Joined nodes count. |
LeftNodes |
integer |
Left nodes count. |
MessageWorkerQueueSize |
integer |
Current message worker queue size. |
PendingMessagesRegistered |
integer |
Pending registered messages count. |
RejectedSslConnectionsCount |
integer |
TCP discovery connections count that were rejected due to the SSL errors. |
SslEnabled |
boolean |
Indicates whether SSL is enabled. |
TotalProcessedMessages |
integer |
Total processed messages count. |
TotalReceivedMessages |
integer |
Total received messages count. |
Data Region IO
Register name: io.dataregion.{data_region_name}
Name | Type | Description |
---|---|---|
AllocationRate |
long |
Allocation rate (pages per second) averaged across rateTimeInternal. |
CheckpointBufferSize |
long |
Checkpoint buffer size in bytes. |
DirtyPages |
long |
Number of pages in memory not yet synchronized with persistent storage. |
EmptyDataPages |
long |
Calculates empty data pages count for region. It counts only totally free pages that can be reused (e. g. pages that are contained in reuse bucket of free list). |
EvictionRate |
long |
Eviction rate (pages per second). |
LargeEntriesPagesCount |
long |
Count of pages that fully ocupied by large entries that go beyond page size |
OffHeapSize |
long |
Offheap size in bytes. |
OffheapUsedSize |
long |
Offheap used size in bytes. |
PagesFillFactor |
double |
The percentage of the used space. |
PagesRead |
long |
Number of pages read from last restart. |
PagesReplaceAge |
long |
Average age at which pages in memory are replaced with pages from persistent storage (milliseconds). |
PagesReplaceRate |
long |
Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
PagesReplaced |
long |
Number of pages replaced from last restart. |
PagesWritten |
long |
Number of pages written from last restart. |
PhysicalMemoryPages |
long |
Number of pages residing in physical RAM. |
PhysicalMemorySize |
long |
Gets total size of pages loaded to the RAM, in bytes |
TotalAllocatedPages |
long |
Total number of allocated pages. |
TotalAllocatedSize |
long |
Gets a total size of memory allocated in the data region, in bytes |
TotalThrottlingTime |
long |
Total throttling threads time in milliseconds. The Ignite throttles threads that generate dirty pages during the ongoing checkpoint. |
UsedCheckpointBufferSize |
long |
Gets used checkpoint buffer size in bytes |
Data Storage
Data Storage metrics.
Register name: io.datastorage
Name | Type | Description |
---|---|---|
CheckpointBeforeLockHistogram |
histogram |
Histogram of checkpoint action before taken write lock duration in milliseconds. |
CheckpointFsyncHistogram |
histogram |
Histogram of checkpoint fsync duration in milliseconds. |
CheckpointHistogram |
histogram |
Histogram of checkpoint duration in milliseconds. |
CheckpointListenersExecuteHistogram |
histogram |
Histogram of checkpoint execution listeners under write lock duration in milliseconds. |
CheckpointLockHoldHistogram |
histogram |
Histogram of checkpoint lock hold duration in milliseconds. |
CheckpointLockWaitHistogram |
histogram |
Histogram of checkpoint lock wait duration in milliseconds. |
CheckpointMarkHistogram |
histogram |
Histogram of checkpoint mark duration in milliseconds. |
CheckpointPagesWriteHistogram |
histogram |
Histogram of checkpoint pages write duration in milliseconds. |
CheckpointSplitAndSortPagesHistogram |
histogram |
Histogram of splitting and sorting checkpoint pages duration in milliseconds. |
CheckpointTotalTime |
long |
Total duration of checkpoint |
CheckpointWalRecordFsyncHistogram |
histogram |
Histogram of the WAL fsync after logging ChTotalNodeseckpointRecord on begin of checkpoint duration in milliseconds. |
CheckpointWriteEntryHistogram |
histogram |
Histogram of entry buffer writing to file duration in milliseconds. |
LastCheckpointBeforeLockDuration |
long |
Duration of the checkpoint action before taken write lock in milliseconds. |
LastCheckpointCopiedOnWritePagesNumber |
long |
Number of pages copied to a temporary checkpoint buffer during the last checkpoint. |
LastCheckpointDataPagesNumber |
long |
Total number of data pages written during the last checkpoint. |
LastCheckpointDuration |
long |
Duration of the last checkpoint in milliseconds. |
LastCheckpointFsyncDuration |
long |
Duration of the sync phase of the last checkpoint in milliseconds. |
LastCheckpointListenersExecuteDuration |
long |
Duration of the checkpoint execution listeners under write lock in milliseconds. |
LastCheckpointLockHoldDuration |
long |
Duration of the checkpoint lock hold in milliseconds. |
LastCheckpointLockWaitDuration |
long |
Duration of the checkpoint lock wait in milliseconds. |
LastCheckpointMarkDuration |
long |
Duration of the checkpoint mark in milliseconds. |
LastCheckpointPagesWriteDuration |
long |
Duration of the checkpoint pages write in milliseconds. |
LastCheckpointTotalPagesNumber |
long |
Total number of pages written during the last checkpoint. |
LastCheckpointSplitAndSortPagesDuration |
long |
Duration of splitting and sorting checkpoint pages of the last checkpoint in milliseconds. |
LastCheckpointStart |
long |
Start timestamp of the last checkpoint. |
LastCheckpointWalRecordFsyncDuration |
long |
Duration of the WAL fsync after logging CheckpointRecord on the start of the last checkpoint in milliseconds. |
LastCheckpointWriteEntryDuration |
long |
Duration of entry buffer writing to file of the last checkpoint in milliseconds. |
SparseStorageSize |
long |
Storage space allocated adjusted for possible sparsity, in bytes. |
StorageSize |
long |
Storage space allocated, in bytes. |
WalArchiveSegments |
integer |
Current number of WAL segments in the WAL archive. |
WalBuffPollSpinsRate |
long |
WAL buffer poll spins number over the last time interval. |
WalFsyncTimeDuration |
long |
Total duration of fsync |
WalFsyncTimeNum |
long |
Total count of fsync |
WalLastRollOverTime |
long |
Time of the last WAL segment rollover. |
WalLoggingRate |
long |
Average number of WAL records per second written during the last time interval. |
WalTotalSize |
long |
Total size in bytes for storage wal files. |
WalWritingRate |
long |
Average number of bytes per second written during the last time interval. |
Cluster
Cluster metrics.
Register name: cluster
Name | Type | Description |
---|---|---|
ActiveBaselineNodes |
integer |
Active baseline nodes count. |
Rebalanced |
boolean |
True if the cluster has fully achieved rebalanced state. Note that an inactive cluster always has this metric in False regardless of the real partitions state. |
TotalBaselineNodes |
integer |
Total baseline nodes count. |
TotalClientNodes |
integer |
Client nodes count. |
TotalServerNodes |
integer |
Server nodes count. |
SQL
SQL metrics.
Memory Quotas
Register name: sql.memory.quotas
Name | Type | Description |
---|---|---|
OffloadedQueriesNumber |
number |
Number of queries that were offloaded to disk locally |
OffloadingRead |
bytes |
Number of bytes read from the disk during SQL query offloading |
OffloadingWritten |
bytes |
Number of bytes written to the disk during SQL query offloading |
freeMem |
bytes |
Amount of memory left available for the queries on this node, in bytes (negative value if SQL memory quotas are disabled) |
maxMem |
bytes |
Total amount of memory available for all queries on the current node (negative value if SQL memory quotas are disabled) |
requests |
number |
Total number of times memory quota has been requested on the current node by all the queries |
Parser Cache
Register name: sql.parser.cache
Name | Type | Description |
---|---|---|
hits |
number |
Number of hits for queries cache |
misses |
number |
Number of misses for queries cache |
User Queries
Register name: sql.parser.cache
canceled | number | Number of canceled queries initiated by the current node. This number is included in the general 'failed' metric. |
---|---|---|
failed |
number |
Number of failed queries (including OOME) initiated by the current node |
failedByOOM |
number |
Number of queries failed due to out of memory protection initiated by the current node. This number is included in the general 'failed' metric. |
success |
number |
Number of successfully executed queries initiated by the current node |
Throttling
Throttling metrics for the Write operation. Speed-based throttling protects the checkpoint buffer and the clean pages in the region. The checkpoint buffer needs a stronger protection because an overflow of this buffer makes a node crash. When performing a Write operation, we first check whether the checkpoint buffer is in danger. If it is, we employ the exponential backoff algorithm to protect the buffer: each subsequent "sleep" time is K times longer than the previous one. If the checkpoint buffer is not in danger, we calculate the "sleep" time using the speed-based algorithm to protect the pool of clean pages.
Register name: io.dataregion
Name | Type | Description |
---|---|---|
SpeedBasedThrottlingPercentage |
double |
Fraction of throttling time within average marking time (e.g., "quarter" = 0.25). |
MarkDirtySpeed |
long |
Speed of marking pages dirty, in pages/second. Value is averaged over the last 3 fragments, 0.25 sec each, plus the current fragment, 0-0.25 sec (0.75-1.0 sec total). |
CpWriteSpeed |
long |
Checkpoint write speed, in pages/second. Value is averaged over the last 3 checkpoints plus the current one. |
LastEstimatedSpeedForMarkAll |
long |
Last estimated speed of marking all clean pages dirty to the end of a checkpoint, in pages/second. |
CurrDirtyRatio |
double |
Current ratio of dirty pages (dirty vs total), expressed as a fraction. The fraction is computed for each segment in the current region, and the highest value becomes "current." |
TargetDirtyRatio |
double |
Ratio of dirty pages (dirty vs total), expressed as a fraction. Throttling starts when this ratio is reached. |
ThrottleParkTime |
long |
Park (sleep) time for the Write operation, in nanoseconds. Value is averaged over the last 3 fragments, 0.25 sec each, plus the current fragment, 0-0.25 sec (0.75-1.0 sec total). It defines park periods for either the checkpoint buffer protection or the clean page pool protection. |
CpTotalPages |
int |
Number of pages in the current checkpoint. |
CpEvictedPages |
int |
Number of evicted pages in the current checkpoint. |
CpWrittenPages |
int |
Number of written pages in the current checkpoint. |
CpSyncedPages |
int |
Number of fsynced pages in the current checkpoint. |
CheckpointBufferPagesCount |
int |
Number of occupied pages in the checkpoint buffer. |
CheckpointBufferPagesSize |
int |
Total number of pages in the checkpoint buffer. |
© 2023 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.