Best Practices for Monitoring Distributed In-Memory Computing Systems

This IMCS Europe 2019 video discusses some best practices for monitoring distributed in-memory computing systems, including how to monitor applications, cluster logs, cluster metrics, operating systems, and networks. It provides guidance on tools like Elasticsearch, Grafana, and GridGain Web Console.

When you add a distributed cluster in-between existing systems or new APIs, you introduce a lot of moving parts that can be almost impossible to track and troubleshoot for performance issues or failures. Learn how the veterans monitor various components of a distributed cluster for network, memory, or node-specific issues, and troubleshoot to resolve issues. By the end of this session, you'll have a handy check-list and set of tools to consider using for your own deployments. This session covers:

How to monitor applications, cluster node logs and metrics, JVM, operating system, and the network
What some of the best tools are for different scenarios, including:

Log-based monitoring including Logstash, Elasticsearch, Kibana or Splunk
Grafana
Application monitoring (throughput and latency, GC)
Node’s local metrics monitoring (memory/GC/CPU)
Network issues monitoring (checking node connectivity and latency)
GridGain Web Console
Tips and tricks for how to configure and optimize monitoring

Denis Mekhanikov
Client Service Lead