GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

Memory and JVM Tuning

This article provides best practices for memory tuning that are relevant for deployments with and without native persistence or an external storage. Even though GridGain stores data and indexes off the Java heap in its own memory-centric storage, Java heap is still used to store objects generated by queries and operations executed by your applications. Thus, certain recommendations should be considered for JVM and garbage collection (GC) related optimizations.

Tune Swappiness Setting

An operating system starts swapping pages from RAM to disk when overall RAM usage hits a certain threshold. Swapping can impact GridGain cluster performance. You can adjust the operating system’s setting to prevent this from happening. For Unix, the best option is to either decrease the vm.swappiness parameter to 10, or set it to 0 if native persistence is enabled:

sysctl –w vm.swappiness=0

The value of this setting can prolong GC pauses as well. For instance, if your GC logs show low user time, high system time, long GC pause records, it might be caused by Java heap pages being swapped in and out. To address this, use the swappiness settings above.

Share RAM with OS and Apps

An individual machine’s RAM is shared among the operating system, GridGain, and other applications. As a general recommendation, if a GridGain cluster is deployed in pure in-memory or in-memory data grid mode (no native persistence usage), then you should not allocate more than 90% of RAM capacity per machine to GridGain nodes.

On the other hand, if native persistence is used, then the OS requires extra RAM for its page cache in order to optimally sync up data to disk. If the page cache is not disabled on your end then you should not give more than 70% of the server’s RAM to GridGain.

Refer to memory configuration for configuration examples.

In addition to that, because using native persistence might cause high page cache utilization, the kswapd daemon might not keep up with page reclamation, which is used by the page cache in the background. As a result, this can cause high latencies due to direct page reclamation and lead to long GC pauses.

To work around the effects caused by page memory reclamation on Linux, add extra bytes between wmark_min and wmark_low with /proc/sys/vm/extra_free_kbytes:

sysctl -w vm.extra_free_kbytes=1240000

Refer to this resource for more insight into the relationship between page cache settings, high latencies, and long GC pauses.

Java Heap and GC Tuning

Even though GridGain and Ignite keep data in their own off-heap memory regions invisible to Java garbage collectors, Java Heap is still being used for objects generated by your applications workloads. For instance, whenever you run SQL queries against a GridGain cluster, the queries will access data and indexes stored in the off-heap memory while result sets of such queries will be kept in Java Heap until your application reads the result sets back. Thus, depending on the throughput and type of your operations, Java Heap can still be utilized heavily and this might require JVM and GC related tuning for your workloads.

We’ve included some common recommendations and best practices below. Feel free to start with them and make further adjustments as necessary, depending on the specifics of your applications.

Generic GC Settings

Below are sets of example JVM configurations for applications that can utilize Java Heap on server nodes heavily, thus triggering long — or frequent, short — stop-the-world (STW) GC pauses.

For JDK 1.8+ deployments you should use G1 garbage collector. The settings below are a good starting point if 10GB heap is more than enough for your server nodes:

-server
-Xms10g
-Xmx10g
-XX:+AlwaysPreTouch
-XX:+UseG1GC
-XX:+ScavengeBeforeFullGC
-XX:+DisableExplicitGC

If G1 does not work for you, consider using CMS collector and starting with the following settings. Note, that 10GB heap is used as an example and a smaller heap can be enough for your use case:

-server
-Xms10g
-Xmx10g
-XX:+AlwaysPreTouch
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled
-XX:+ScavengeBeforeFullGC
-XX:+CMSScavengeBeforeRemark
-XX:+DisableExplicitGC

Advanced Memory Tuning

In Linux and Unix environments, it’s possible that an application can face long GC pauses or lower performance due to I/O or memory starvation due to kernel specific settings. This section provides some guidelines on how to modify kernel settings in order to overcome long GC pauses.

If GC logs show low user time, high system time, long GC pause then most likely memory constraints are triggering swapping or scanning of a free memory space.

  • Check and adjust the swappinness settings.

  • Add –XX:+AlwaysPreTouch to JVM settings on startup.

  • Disable NUMA zone-reclaim optimization.

sysctl –w vm.zone_reclaim_mode=0
  • Turn off Transparent Huge Pages if RedHat distribution is used.

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

Advanced I/O Tuning

If GC logs show low user time, low system time, long GC pause then GC threads might be spending too much time in the kernel space being blocked by various I/O activities. For instance, this can be caused by journal commits, gzip, or log roll over procedures.

As a solution you might need to make the page flushing interval to disk more frequent by changing the 30 second default to 5 seconds:

sysctl –w vm.dirty_writeback_centisecs=500
sysctl –w vm.dirty_expire_centisecs=500