Hadoop Acceleration with the GridGain In-Memory Data Fabric

Accelerate MapReduce and HIVE Jobs by 10x in 10 Minutes

The GridGain In-Memory Accelerator for Hadoop® is built on the GridGain In-Memory Data Fabric. It is a plug-and-play solution that can be downloaded and installed in 10 minutes to accelerate MapReduce and HIVE jobs by a factor of up to 10x.

The GridGain In-Memory Accelerator for Hadoop requires zero code changes and works with any commercial or open source distribution of Hadoop 2.x. It is a dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS.

GridGain Features for Hadoop Acceleration

The GridGain In-Memory Accelerator for Hadoop enables fast data processing using the tools and technology your organization already uses. It eliminates the trade-offs when adding real-time capabilities to existing Hadoop systems. GridGain uses the Apache Ignite File System (IGFS) to support a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster or as an intelligent caching layer with HDFS configured as the primary file system. As a caching layer, it provides highly tunable read-through and write-through logic and users can select which files or directories to cache and how. GridGain’s in-memory implementation of MapReduce allows you to parallelize the processing of in-memory data stored in IGFS with zero changes to your Hadoop MapReduce code. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

GridGain MapReduce

 

The GridGain In-Memory Accelerator for Hadoop is an ETL-free architecture. It enables you to process live data in Hadoop without offloading it to other downstream systems to gain the advantage of in-memory computing performance. The GridGain In-Memory Accelerator for Hadoop avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems. It comes with a comprehensive GUI-based management and monitoring tool (GridGain Visor).

Hadoop Performance with GridGain

The GridGain In-Memory Accelerator for Hadoop delivers up to a 10x performance gain for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation. It delivers instant acceleration to existing Hadoop-based systems and products.

IGFS is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. The following benchmarks compare raw IGFS and HDFS performance against the same set of operations:

Benchmark IGFS, ms. HDFS, ms. Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

The GridGain In-Memory Accelerator for Hadoop relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop job tracker polling, task tracker process creation, deployment and provisioning.

Plug-and-Play Hadoop with GridGain

The GridGain In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) and can be downloaded and installed in less than 10 minutes. It is a free Hadoop plugin that works with your choice of open source or commercial Hadoop distributions. IGFS is 100% HDFS-compatible and provides plug-and-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce and HIVE support require zero code changes for existing application logic and provide dramatic performance boosts for CPU-intensive tasks.