Memory and JVM Tuning
This article provides best practices for memory tuning that are relevant for deployments with and without native persistence or an external storage. Even though GridGain stores data and indexes off the Java heap, Java heap is still used to store objects generated by queries and operations executed by your applications. Thus, certain recommendations should be considered for JVM and garbage collection (GC) related optimizations.
Tune Swappiness Setting
An operating system starts swapping pages from RAM to disk when overall RAM usage hits a certain threshold.
Swapping can impact GridGain cluster performance.
You can adjust the operating system’s setting to prevent this from happening.
For Unix, the best option is to either decrease the
vm.swappiness parameter to
10, or set it to
0 if native persistence is enabled:
sysctl -w vm.swappiness=0
The value of this setting can prolong GC pauses as well. For instance, if your GC logs show
low user time, high
system time, long GC pause records, it might be caused by Java heap pages being swapped in and out. To
address this, use the
swappiness settings above.
Share RAM with OS and Apps
An individual machine’s RAM is shared among the operating system, GridGain, and other applications. As a general recommendation, if a GridGain cluster is deployed in pure in-memory mode (native persistence is disabled), then you should not allocate more than 90% of RAM capacity to GridGain nodes.
On the other hand, if native persistence is used, then the OS requires extra RAM for its page cache in order to optimally sync up data to disk. If the page cache is not disabled, then you should not give more than 70% of the server’s RAM to GridGain.
Refer to memory configuration for configuration examples.
In addition to that, because using native persistence might cause high page cache utilization, the
kswapd daemon might not keep up with page reclamation, which is used by the page cache in the background.
As a result, this can cause high latencies due to direct page reclamation and lead to long GC pauses.
To work around the effects caused by page memory reclamation on Linux, add extra bytes between
sysctl -w vm.extra_free_kbytes=1240000
Refer to this resource for more insight into the relationship between page cache settings, high latencies, and long GC pauses.
Java Heap and GC Tuning
Even though GridGain and Ignite keep data in their own off-heap memory regions invisible to Java garbage collectors, Java Heap is still used for objects generated by your applications workloads. For instance, whenever you run SQL queries against a GridGain cluster, the queries will access data and indexes stored in the off-heap memory while the result sets of such queries will be kept in Java Heap until your application reads the result sets. Thus, depending on the throughput and type of operations, Java Heap can still be utilized heavily and this might require JVM and GC related tuning for your workloads.
We’ve included some common recommendations and best practices below. Feel free to start with them and make further adjustments as necessary, depending on the specifics of your applications.
Generic GC Settings
Below are sets of example JVM configurations for applications that can utilize Java Heap on server nodes heavily, thus triggering long — or frequent, short — stop-the-world GC pauses.
For JDK 1.8+ deployments you should use G1 garbage collector. The settings below are a good starting point if 10GB heap is more than enough for your server nodes:
-server -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
If G1 does not work for you, consider using CMS collector and starting with the following settings. Note that 10GB heap is used as an example and a smaller heap can be enough for your use case:
-server -Xms10g -Xmx10g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC
Advanced Memory Tuning
In Linux and Unix environments, it’s possible that an application can face long GC pauses or lower performance due to I/O or memory starvation due to kernel specific settings. This section provides some guidelines on how to modify kernel settings in order to overcome long GC pauses.
If GC logs show
low user time, high system time, long GC pause then most likely memory constraints are triggering swapping or scanning of a free memory space.
Check and adjust the swappiness settings.
-XX:+AlwaysPreTouchto JVM settings on startup.
Disable NUMA zone-reclaim optimization.
sysctl -w vm.zone_reclaim_mode=0
Turn off Transparent Huge Pages if RedHat distribution is used.
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
Advanced I/O Tuning
If GC logs show
low user time, low system time, long GC pause then GC threads might be spending too much time in the kernel space being blocked by various I/O activities.
For instance, this can be caused by journal commits, gzip, or log roll over procedures.
As a solution, you can try changing the page flushing interval from the default 30 seconds to 5 seconds:
sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm.dirty_expire_centisecs=500