3x Faster MapReduce and HIVE Jobs with Hadoop® Acceleration Using In-Memory Computing

The GridGain® in-memory computing platform can provide Hadoop acceleration of up to 3x. GridGain is a plug-and-play solution that can be downloaded and installed in minutes to accelerate MapReduce and HIVE jobs.

GridGain acceleration for Hadoop requires zero code changes and works with any commercial or open source distribution of Hadoop 2.x. It includes a dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS.

GridGain Features for Hadoop Acceleration

The GridGain In-Memory Accelerator for Hadoop, available for free download, enables fast data processing using the tools and technology your organization already uses. It eliminates the trade-offs when adding real-time capabilities to existing Hadoop systems. GridGain uses the Apache Ignite File System (IGFS) to support a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster or as an intelligent caching layer with HDFS configured as the primary file system. As a caching layer, it provides highly tunable read-through and write-through logic and users can select which files or directories to cache and how. GridGain’s in-memory implementation of MapReduce allows you to parallelize the processing of in-memory data stored in IGFS with zero changes to your Hadoop MapReduce code. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

Hadoop
An easy to install, plug-and-play solution that can accelerate MapReduce and HIVE jobs by up to 3x, featuring a dual mode, high-performance in-memory file system

The GridGain solution for Hadoop acceleration is an ETL-free architecture. It enables you to process live data in Hadoop without offloading it to other downstream systems to gain the advantage of in-memory computing performance. The GridGain In-Memory Accelerator for Hadoop avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems.

HDFS Performance with GridGain

The GridGain solution for Hadoop acceleration delivers up to a 3x performance gain for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation. It delivers instant acceleration to existing Hadoop-based systems and products.

IGFS is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. The following benchmarks compare raw IGFS and HDFS performance against the same set of operations:

Benchmark IGFS, ms. HDFS, ms. Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

The GridGain solution for Hadoop acceleration relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop job tracker polling, task tracker process creation, deployment and provisioning.

Plug-and-Play Hadoop with GridGain

The GridGain In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) and can be downloaded and installed in less than 10 minutes. It is a free Hadoop plugin that works with your choice of open source or commercial Hadoop distributions. IGFS is 100% HDFS-compatible and provides plug-and-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce and HIVE support require zero code changes for existing application logic and provide dramatic performance boosts for CPU-intensive tasks.

In addition, GridGain is a certified member of the Hortonworks Partnerworks ISV/IHV program. This means that all GridGain in-memory computing products are guaranteed to work with Hortonworks Connected Data Platforms.