Setting Up Prometheus for Apache Ignite and GridGain

Overview

Prometheus is a popular monitoring tool that is supported by the Cloud Native Computing Foundation, the group who support Kubernetes. We often see Apache Ignite and GridGain users trying to integrate Prometheus with their clusters. This post provides hints about how to integrate Prometheus with Apache Ignite and GridGain.

Prometheus Configuration

This post isn’t about how best to configure Prometheus, so we’ll use the defaults as much as we can. I use the following configuration for the rest of this article.

First, I made the scrape interval more frequent. I prefer that my metrics update more frequently than once per minute, although a more frequent interval is not required.

The second change makes everything work. I added a couple of scrape configs, which reference two Apache Ignite nodes that run on my local machine.

global:
  scrape_interval:     15s # Set the scrape interval to 15 seconds. The default is 1 minute.
  evaluation_interval: 15s # Evaluate the rules every 15 seconds. The default is every 1 minute.

alerting:
  alertmanagers:
  - static_configs:
    - targets:

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'ignite node 1'
    static_configs:
    - targets: ['localhost:9000']
  - job_name: 'ignite node 2'
    static_configs:
    - targets: ['localhost:9001']
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

Viewing Apache Ignite Metrics via JMX

After Prometheus is up and running, we need an exporter to send metrics data to be visualized. Because Prometheus is a Java application, people often look first at Apache Ignite’s JMX beans. JMX is a Java standard for viewing the metrics of a running JVM.

Prometheus has a JMX Exporter that you can use as a bridge between Prometheus and GridGain.

You can then run the JMX Exporter  as a Java agent:

$ bin/ignite.sh -v $PROJECT/config/myconfig.xml -J-javaagent:$MYLIBS/jmx_prometheus_javaagent-0.14.0.jar=9000:$PROJECT/config/prometheus.yml

Alternatively, you can use the jmx_prometheus_httpserver server. If you use the server, you need to add the “hostPort” parameter to your JMX Exporter configuration. You point the exporter at the JMX port of your Apache Ignite or GridGain node.

One last thing about using the JMX Exporter: The process works correctly only with versions of GridGain and Apache Ignite that are earlier than version 2.7.x. In Apache Ignite 2.8.x, significant updates were made to metrics reporting. The updates caused Prometheus to not understand the format of some JMX beans. After the last release of the exporter, a fix was merged to the master branch. However, the fix did not work for me. For the connection to work successfully, I needed to do some additional error handling.

So, although you can employ JMX beans to use Prometheus with Apache Ignite, you might find a few wrinkles. There might be a better way.

Viewing Apache Ignite Metrics via OpenCensus

OpenCensus is a cross-platform metrics and tracing format that Prometheus supports natively. Our commitment to open standards and open source led GridGain to implement OpenCensus support in Apache Ignite too. Indeed, we use OpenCensus in our Control Center monitoring and management tool.

Before you can use Open Census, you must enable it. To enable it, you move the $IGNTE_HOME/libs/optional/ignite-opencensus folder up a level (to $IGNITE_HOME/libs). In recent versions of GridGain, the folder is on the $IGNITE_HOME/libs level. But, you should check to be sure that the folder does not need to move up a level.

In addition to enabling the ignite-opencensus module, you also need to register the statistics collector and start a transport for the metrics (a web server).

The easiest way to do this is by adding some extra beans into your Apache Ignite configuration file:

    <bean id="opencensusWrapper" class="org.springframework.beans.factory.config.MethodInvokingBean">
        <property name="staticMethod" value="io.opencensus.exporter.stats.prometheus.PrometheusStatsCollector.createAndRegister"/>
    </bean>

    <bean id="httpServer" class="io.prometheus.client.exporter.HTTPServer">
        <constructor-arg type="java.lang.String" value="localhost"/>
        <constructor-arg type="int" value="9000"/>
        <constructor-arg type="boolean" value="true"/>
    </bean>

The first bean tells OpenCensus to start collecting and exporting the metrics. The second starts a web server on port 9000. The web server provides the link between Apache Ignite and Prometheus.

Before you start your node, you'll also need to add the Prometheus JAR files to your USER_LIBS environment variable. In this sample project on GitHub, I show an alternate approach: building all the required libraries into a single, large JAR file.

You can confirm that the metrics are being exported correctly by opening http://localhost:9000 in a web browser. If you see a long list of metrics, then the exporter is working. You may need to wait a minute or two before the full list is available. After the list is displayed, you can open Prometheus (http://localhost:9090/) and create graphs and alerts that are based on the metrics.

Summary

Cluster monitoring is a task that is often addressed too late in the development process. As we’ve seen, connecting your GridGain or Apache Ignite cluster to Prometheus, alongside Control Center or other specialised management tools, is straight-forward.