GridGain Developers Hub

GridGain.NET Troubleshooting

Overview

This page covers several troubleshooting techniques and commonly-known issues you can come across while building and using your GridGain.NET applications in production.

Troubleshooting With Console

Ignite produces console output (stdout): information, metrics, warnings, error details. If your app does not open console, you may redirect the console output to a string or a file:

var sw = new StringWriter();
Console.SetOut(sw);

// Examine output:
sw.ToString();

Getting More Insights On Exceptions

When you are getting an IgniteException, always make sure to examine the InnerException property that often contains more details on the root cause of the issue. You can do that in Visual Studio debugger or by calling ToString() on the exception object:

Visual Studio Debugger
try {
    IQueryCursor<List> cursor = cache.QueryFields(query);
}
catch (IgniteException e) {
    // Printing out the whole exception meesage.
    Console.WriteLine(e.ToString());
}

Commonly-Known Issues

The following section covers several issues you can come across while designing your GridGain.NET applications.

Failed to load jvm.dll

Make sure that Java Development Kit is installed, and the JAVA_HOME variable is set and points to a JDK installation directory.

The errorCode=193 code is ERROR_BAD_EXE_FORMAT, which is often caused by x64/x86 mismatch. Make sure that the installed JDK and your application have the same x64/x86 platform target. Ignite detects proper JDK automatically when JAVA_HOME is not set, so if you have x86 AND x64 JDK installed, it will work in any mode.

The 126 ERROR_MOD_NOT_FOUND code can occur due to missing dependencies:

Java class is not found

Check your the IGNITE_HOME environment variable, IgniteConfiguration.IgniteHome and IgniteConfiguration.JvmClasspath properties. Refer to Deployment section for more details. ASP.NET/IIS scenarios require additional steps.

Freeze on Ignition.Start

Examine console output. Most often this is caused by a topology join failure:

  • Ignite DiscoverySpi settings are incorrect

  • ClientMode is true, but there are no servers nodes that form the cluster.

Failed to start manager : GridManagerAdapter

Examine console output. Most often this is caused by an invalid or incompatible configuration:

  • Some configuration property has an invalid value (out of range and the like).

  • Some configuration property is incompatible with a value in other cluster nodes. In particular, BinaryConfiguration properties, such as CompactFooter, IdMapper, and NameMapper should be the same on all nodes.

The latter problem often arises when building a mixed cluster (Java + .NET nodes), because default configuration on these platforms is different. .NET only supports BinaryBasicIdMapper and BinaryBasicNameMapper. Java configuration has to be fixed the following way to enable .NET nodes connectivity:

<property name="binaryConfiguration">
    <bean class="org.apache.ignite.configuration.BinaryConfiguration">
        <property name="compactFooter" value="true"/>
        <property name="idMapper">
            <bean class="org.apache.ignite.binary.BinaryBasicIdMapper">
                <constructor-arg value="true"/>
            </bean>
        </property>
        <property name="nameMapper">
            <bean class="org.apache.ignite.binary.BinaryBasicNameMapper">
                <constructor-arg value="true"/>
            </bean>
        </property>
    </bean>
</property>

Could not load file or assembly 'MyAssembly' or one of its dependencies. The system cannot find the file specified.

This exception can occur due to missing assemblies on remote nodes. See Standalone Nodes: Loading User Assemblies for details.

Stack smashing detected: dotnet terminated

This happens on Linux with .NET Core when NullReferenceException occurs in user code. The reason is that both .NET and Java use SIGSEGV to handle certain exceptions, including NullPointerException and NullReferenceException, and when JVM runs in the same process as .NET, it overrides that handler, breaking .NET exception handling (see 1, 2).

The fix for this issue exists in .NET Core 3.0 (#25972. by setting the COMPlus_EnableAlternateStackCheck environment variable to 1.

Zombie processes on Linux

On Linux, both .NET and Java install SIGCHLD handler to deal with child process termination.

  • Handlers are installed lazily (when a Process is first started)

  • Only one handler can exist at a time

Therefore, it is possible that Java overwrites .NET handler, or vice versa, making it impossible to clean up child processes on one of the platforms, resulting in zombie processes.

GridGain uses child processes on Java side in one particular case: when Persistence is enabled and direct-io module is used. In this case .NET System.Diagnostics.Process API should not be used.

Workaround

To work around the issue, make sure that child processes are created either only on Java side, or only on .NET side.

For example, when direct-io is used, and .NET code requires starting a child process, move the process handling logic to Java side and invoke it with Compute ExecuteJavaTask API. Alternatively, use Services API to call Java service from .NET.

DllNotFoundException: Unable to load shared library 'libcoreclr.so' or one of its dependencies

Occurs on .NET 5 in a single-file publish mode (e.g. dotnet publish --self-contained true -r linux-x64 -p:PublishSingleFile=true).

Workaround

Add the following code before starting the Ignite node:

NativeLibrary.SetDllImportResolver(
typeof(Ignition).Assembly,
(lib, _, _) => lib == "libcoreclr.so" ? (IntPtr) (-1) : IntPtr.Zero);