Hadoop Acceleration with the GridGain In-Memory Data Platform

3x Faster MapReduce and HIVE Jobs with Hadoop® Acceleration Using In-Memory Computing

The GridGain® in-memory computing platform can provide Hadoop acceleration of up to 3x. GridGain is a plug-and-play solution that can be downloaded and installed in minutes to accelerate MapReduce and HIVE jobs.

GridGain acceleration for Hadoop requires zero code changes and works with any commercial or open source distribution of Hadoop 2.x. It includes a dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS.

The GridGain Data Lake Accelerator Improves Real-Time Analytics

The GridGain Data Lake Accelerator, built on the GridGain in-memory computing platform, accelerates data lake analytics and access by providing bi-directional integration with Hadoop. This integration brings the historical data into the same in-memory computing layer as the real-time operational data, enabling real-time analytics and computing on the combined data.

GridGain Features for Hadoop Acceleration

The GridGain In-Memory Accelerator for Hadoop, available for free download, enables fast data processing with the tools and technology your organization already uses. It eliminates the performance and complexity trade-offs when adding real-time capabilities to existing Hadoop systems. GridGain uses the Apache Ignite File System (IGFS) to support a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster or as an intelligent caching layer with HDFS configured as the primary file system. As a caching layer, it provides highly tunable read-through and write-through logic, and users can select which files or directories to cache and how. GridGain’s in-memory implementation of MapReduce allows you to parallelize the processing of in-memory data stored in IGFS with zero changes to your Hadoop MapReduce code. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

Hadoop Acceleration
An easy to install, plug-and-play solution that can accelerate MapReduce and HIVE jobs by up to 3x, featuring a dual mode, high-performance in-memory file system

The GridGain solution for Hadoop acceleration is an architecture free from extract, transform, learn (ETL). With the GridGain In-Memory Accelerator for Hadoop, you can process live data in Hadoop without offloading it to other downstream systems to gain the advantage of in-memory computing performance. The GridGain In-Memory Accelerator for Hadoop avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems.

HDFS Performance with GridGain

The GridGain solution for Hadoop acceleration delivers up to a 3x performance gain for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation. It delivers instant acceleration to existing Hadoop-based systems and products.

IGFS is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. The following benchmarks compare raw IGFS and HDFS performance against the same set of operations:

Benchmark IGFS, ms. HDFS, ms. Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

The GridGain solution for Hadoop acceleration relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop job tracker polling, task tracker process creation, deployment and provisioning.

Plug-and-Play Hadoop with GridGain

The GridGain In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) and can be downloaded and installed in less than ten minutes. It is a free Hadoop plugin that works with your choice of open source or commercial Hadoop distributions. IGFS is 100% HDFS-compatible and provides plug-and-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce and HIVE support require zero code changes for existing application logic and provide dramatic performance boosts for CPU-intensive tasks.

In addition, GridGain is a certified member of the Hortonworks Partnerworks ISV/IHV program. This means that all GridGain in-memory computing products are guaranteed to work with Hortonworks Connected Data Platforms.

For further information,
contact GridGain Systems →