The GridGain In-Memory Accelerator for Hadoop® is built on the GridGain In-Memory Data Fabric. It is a plug-and-play solution that can be downloaded and installed in 10 minutes to accelerate MapReduce and HIVE jobs by a factor of up to 10x.
The GridGain In-Memory Accelerator for Hadoop requires zero code changes and works with any commercial or open source distribution of Hadoop 2.x. It is a dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS.
GridGain Features for Hadoop Acceleration
The GridGain In-Memory Accelerator for Hadoop enables fast data processing using the tools and technology your organization already uses. It eliminates the trade-offs when adding real-time capabilities to existing Hadoop systems. GridGain’s in-memory file system (GGFS) supports a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster or as an intelligent caching layer with HDFS configured as the primary file system. As a caching layer, it provides highly tunable read-through and write-through logic and users can select which files or directories to cache and how. GridGain’s in-memory implementation of MapReduce allows you to parallelize the processing of in-memory data stored in GGFS with zero changes to the your Hadoop MapReduce code. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.
The GridGain In-Memory Accelerator for Hadoop is an ETL-free architecture. It enables you to process live data in Hadoop without offloading it to other downstream systems to gain the advantage of in-memory computing performance. The GridGain In-Memory Accelerator for Hadoop avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems. It comes with a comprehensive GUI-based management and monitoring tool (GridGain Visor).
Hadoop Performance with GridGain
The GridGain In-Memory Accelerator for Hadoop delivers up to a 10x performance gain for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation. It delivers instant acceleration to existing Hadoop-based systems and products.
The GridGain File System (GGFS) is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. The following benchmarks compare raw GGFS and HDFS performance against the same set of operations:
|Benchmark||GGFS, ms.||HDFS, ms.||Boost, %|
|File Random Access||413||2931||710%|
The tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.
The GridGain In-Memory Accelerator for Hadoop relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop job tracker polling, task tracker process creation, deployment and provisioning.
Plug-and-Play Hadoop with GridGain
The GridGain In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) and can be downloaded and installed in less than 10 minutes. It is a free Hadoop plugin that works with your choice of open source or commercial Hadoop distributions. The GridGain in-memory file system (GGFS) is 100% HDFS-compatible and provides plug-and-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce and HIVE support require zero code changes for existing application logic and provide dramatic performance boosts for CPU-intensive tasks.