Os Tuning
CPU power management
Make sure you CPUs are in performance power mode:
# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance
Disable CPU clock throttling in BIOS if necessary.
Ulimits
You can configure user limits for GridGain in the /etc/security/limits.conf file. In it, you set the limits for each user in the following format:
<domain> <type> <item> <value>
For GridGain, you can set both soft and hard limit types on the maximum number of open file descriptors (nofile) and maximum number of processes (nproc). The recommended values for environments with several distribution zones:
gridgain - nofile 65536 gridgain - nproc 65536
If the system-wide number of file descriptors is low, setting the nofile limit to a high value will not eliminate the "Too many open files" error. You can resolve this issue by modifying /proc/sys/fs/file-max.
Linux Kernel Panic Delay
In case of kernel panic, GridGain needs some time to create data dumps. It is recommended to set the kernel.panic parameter to 3. You can set Linux kernel properties in the /etc/sysctl.conf file.
Utilizing Linux Virtual Memory
Linux leaves space for an in-memory cache to handle I/O operations. You can manually configure them to provide more performance and better safety for larger operations. To configure the parameters below, set them in the /etc/sysctl.conf file and execute the sudo sysctl –p command, or run the sysctl -w command.
vm.dirty_background_ratio = 1 vm.dirty_ratio = 20 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 vm.overcommit_memory = 2 vm.overcommit_ratio = 100 vm.swappiness = 0 vm.max_map_count=262144
Below is the expanded information on these parameters:
-
The
vm.dirty_background_ratioparameter sets how much of memory can be occupied by "dirty" pages after which they will be flushed to disk. Having too many dirty pages in memory can be detrimental to performance, so it is recommended to have a low percentage. -
The
vm.dirty_ratioparameter sets the percent of memory occupied by dirty pages. If too many pages are dirty, they will be flushed to disk. Setting it low keeps the memory clean. -
The
vm.dirty_expire_centisecsparameter sets how long dirty pages can be stored in memory. We recommend doing this every 5 seconds. -
The
vm.dirty_writeback_centisecsparameter configures how often the system checks if any data needs to be flushed to the disk. It is recommended to keep checking it every 1 second. -
The
vm.overcommit_memoryparameter sets if applications can commit more data to memory than there is available space. Set to 2 to never overcommit. -
The
vm.overcommit_ratiosets how much space is kept for overcommit, in percent. Set it to 100 to never commit more to memory than you have space available. -
The
vm.swappinessparameter configures if swap will be used. We recommend disabling it to avoid it interfering with persistence. -
The
vm.max_map_countcontrols the maximum number of memory map areas a single process can have. Low values can cause crashes and out of memory errors.
Tune Swappiness Setting
An operating system starts swapping pages from RAM to disk when overall RAM usage hits a certain threshold.
Swapping can impact GridGain cluster performance.
You can adjust the operating system’s setting to prevent this from happening.
For Unix, the best option is to either decrease the vm.swappiness parameter to 10, or set it to 0 if native persistence is enabled:
sysctl -w vm.swappiness=0
The value of this setting can prolong GC pauses as well. For instance, if your GC logs show low user time, high
system time, long GC pause records, it might be caused by Java heap pages being swapped in and out. To
address this, use the swappiness settings above.
Share RAM with OS and Apps
An individual machine’s RAM is shared among the operating system, GridGain, and other applications. As a general recommendation, if a GridGain cluster is deployed in pure in-memory mode (native persistence is disabled), then you should not allocate more than 90% of RAM capacity to GridGain nodes.
On the other hand, if native persistence is used, then the OS requires extra RAM for its page cache in order to optimally sync up data to disk. If the page cache is not disabled, then you should not give more than 70% of the server’s RAM to GridGain.
See an example configuration for more details.
In addition to that, because using native persistence might cause high page cache utilization, the kswapd daemon might not keep up with page reclamation, which is used by the page cache in the background.
As a result, this can cause high latencies due to direct page reclamation and lead to long GC pauses.
To work around this effect, add extra bytes of free memory to be reserved by the kernel:
sysctl -w vm.min_free_kbytes=1240000
Advanced Memory Tuning
In Linux and Unix environments, it’s possible that an application can face long GC pauses or lower performance due to I/O or memory starvation due to kernel specific settings. This section provides some guidelines on how to modify kernel settings in order to overcome long GC pauses.
If GC logs show low user time, high system time, long GC pause then most likely memory constraints are triggering swapping or scanning of a free memory space.
-
Add
-XX:+AlwaysPreTouchto JVM settings on startup. -
Disable NUMA zone-reclaim optimization.
sysctl -w vm.zone_reclaim_mode=0 -
Enable Transparent Huge Pages.
THP can reduce memory management overhead in operating system, especially for large heaps and data regions. Make sure THP is properly configured:
$ cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never $ cat /sys/kernel/mm/transparent_hugepage/defrag always defer [defer+madvise] madvise never $ cat /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none 0
Advanced I/O Tuning
If GC logs show low user time, low system time, long GC pause then GC threads might be spending too much time in the kernel space being blocked by various I/O activities.
For instance, this can be caused by journal commits, gzip, or log roll over procedures.
As a solution, you can try changing the page flushing interval from the default 30 seconds to 5 seconds:
sysctl -w vm.dirty_expire_centisecs=500
sysctl -w vm.dirty_writeback_centisecs=100
Clock tuning
GridGain relies on loosely synchronized clock. It’s very important to set up a source of time synchronization between cluster nodes, like chrony.
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.