GridGain Developers Hub

Tracing

A number of APIs in GridGain are instrumented for tracing with OpenCensus. You can collect distributed traces of various tasks executed in your cluster and use this information to diagnose latency problems.

The following Ignite APIs are instrumented for tracing:

  • Discovery

  • Communication

  • Exchange

  • Transactions

To view traces, export them into external system (see the Control Center documentation). You can use one of the OpenCensus exporters or write your own, but in any case, you need to write code that registers an exporter in Ignite.

Configuring Tracing

Enable OpenCensus tracing in the node configuration. All nodes in the cluster must use the same tracing configuration. The module is enabled by default if you use the GridGain binary distribution. If you use Maven to start GridGain nodes, add the following property to your cluster configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="tracingSpi">
<bean class="org.apache.ignite.spi.tracing.opencensus.OpenCensusTracingSpi"/>
</property>
</bean>
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setTracingSpi(new org.apache.ignite.spi.tracing.opencensus.OpenCensusTracingSpi());
Ignite ignite = Ignition.start(cfg);
This API is not presently available for C#/.NET. You can use XML configuration.
This API is not presently available for C++. You can use XML configuration.

Enabling Trace Sampling

When you start your cluster with the above configuration, Ignite does not collect traces. You have to enable trace sampling for a specific API at runtime. You can turn trace sampling on and off at will, for example, only for the period when you are troubleshooting a problem.

You can do this in two ways:

  • Via the control script from the command line

  • Programmatically

Traces are collected at a given probabilistic sampling rate. The rate is specified as a value between 0.0 and 1.0 inclusive: 0 means no sampling, 1 means always sampling. When the sampling rate is set to a value greater than 0, Ignite collects traces. To disable trace collection, set the sampling rate to 0.

The following sections describe the two ways of enabling trace sampling.

Using Control Script

Go to the {IGNITE_HOME}/bin directory of your Ignite installation. Enable experimental commands in the control script:

control.sh export IGNITE_ENABLE_EXPERIMENTAL_COMMAND=true
control.bat export IGNITE_ENABLE_EXPERIMENTAL_COMMAND=true

Enable tracing for a specific API:

control.sh --tracing-configuration set --scope TX --sampling-rate 1
control.bat --tracing-configuration set --scope TX --sampling-rate 1

The --scope parameter specifies the API you want to trace. The following APIs are instrumented for tracing:

  • DISCOVERY — discovery events

  • EXCHANGE — exchange events

  • COMMUNICATION — communication events

  • TX — transactions

  • CACHE_API_WRITE — write events

  • CACHE_API_READ — read events

  • SQL — SQL events

The --sampling-rate is the probabilistic sampling rate, a number between 0 and 1:

  • 0 — no sampling,

  • 1 — always sampling.

See the Control Script section for the list of all parameters.

Programmatically

Once you start the node, you can enable trace sampling as follows:

Ignite ignite = Ignition.start();

ignite.tracingConfiguration().set(
        new TracingConfigurationCoordinates.Builder(Scope.TX).build(),
        new TracingConfigurationParameters.Builder().withSamplingRate(1).build());

Analyzing Trace Data

A trace is recorded information about the execution of a specific event. Each trace consists of a tree of spans. A span is an individual unit of work performed by the system to process the event.

Because of the distributed nature of Ignite, an operation usually involves multiple nodes. Therefore, a trace can include spans from multiple nodes. Each span always contains the information about the node where the corresponding operation was executed.

In the image of the transaction trace presented above, you can see that the trace contains the spans associated with the following operations:

  • acquire locks (transactions.colocated.lock.map),

  • get (transactions.near.enlist.read),

  • put (transactions.near.enlist.write),

  • commit (transactions.commit), and

  • close (transactions.close).

The commit operation, in turn, consists of two operations: prepare and finish. You can click on each span to view the annotations and tags attached to it.