Using Tracing to Resolve Performance Issues in Apache Ignite
Performance troubleshooting becomes more challenging when an operation spans multiple processes. Every operation that is executed against an Apache Ignite cluster spans several processes. The operation is started on the application and is processed by one or more Ignite server nodes. Tracing is a well-known technique for performance troubleshooting, records the request’s journey from start to finish and provides essential timing details so that you can quickly resolve bottlenecks and hot spots in your cluster.
Enable Tracing for Transactions
In Ignite, tracing is disabled by default to ensure that the cluster does not dissipate its resources by collecting traces that are of no interest to you. It’s assumed that you will turn tracing on manually for a specific type of operation whenever you observe performance degradation.
Now, go ahead and enable the tracing of distributed transactions in Ignite:
Switch to the Tracing view in Nebula.
Click Configure Tracing.
Enable Transactions tracing, select
Cache API Writescope, and change the Sampling Rate field to
Click Ok to accept the changes.
When tracing is enabled, GridGain Nebula intercepts transactions' traces. Open the Tracing screen of Nebula to see a list of already recorded traces:
Analyze Tracing Sample
Ignite uses the 2-phase-commit protocol (2PC) for its transactional engine. With tracing in place, you can observe how a transaction spans multiple nodes, how the commit splits into several steps and, most importantly, how much time is required to complete each step.
In Nebula, select a trace from the list of traces and unfold the
transactions.commit sub-tree as shown in the following screenshot:
The screenshot highlights three steps (also known, in tracing terminology, as “spans”) of the commit phase:
transactions.commitspan: The commit is initiated by the application that is connected to the cluster via a client node. The id of the client node is
tx.near.process.prepare.request: The 2PC began executing the prepare phase of the protocol, and the request arrived in the primary node, which keeps the record that is being updated by the transaction. The id of the primary node is
tx.dht.process.prepare.request: Because the cluster keeps a backup copy of every record, the primary node (
D6CEAFA2) sent the prepare request to the backup node that keeps the copy. The id of the backup node is
Presently, there is no performance degradation, and all transactions are being committed timely. However, if any hot spot or bottleneck appears in production, you will know how to discover it with tracing.
Disable Tracing for Transactions
Before you proceed with the tutorial’s next part, disable tracing for transactions:
Go to the Tracing view.
Click Configure Tracing.
Disable Transaction traces.
Complete the next part of the tutorial, where you learn how to back up and restore the cluster: