In-Memory Accelerator for Apache Hadoop®

Your Hadoop – 100x Faster

Boost HDFS Performance
Eliminate MapReduce Overhead
Any Hadoop & Minimal Integration

Hadoop Accelerator key innovations:

  • GridGain In-Memory File System
  • GridGain In-Memory MapReduce

GridGain’s In-Memory Accelerator for Hadoop is based on the industry’s first dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS – and an in-memory optimized MapReduce implementation. In-memory HDFS and in-memory MapReduce provide easy to use extension to disk-based HDFS and traditional MapReduce delivering up to 100x faster performance.

In-Memory Accelerator for Hadoop requires minimal or no integration and works with any commercial or open source version of Hadoop available including Cloudera, HortonWorks, MapR, Apache, Intel, AWS, as well as any other Hadoop 1.x and Hadoop 2.x distributions.


Hadoop Accelerator enhances existing Hadoop technology to enable fast data processing using the tools and technology your organization is already using today. It was designed to eliminate the trade-offs when adding real time capabilities to existing Hadoop systems.

GridGain’s in-memory file system (GGFS) supports dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster, or in tandem with HDFS, serving as an intelligent caching layer with HDFS configured as the primary file system.

As a caching layer it provides highly tunable read-through and write-through logic and users can freely select which files or directories to be cached and how. In either case GGFS can be used as a drop-in alternative for, or an extension of, standard HDFS providing an instant performance increase.

GridGain’s in-memory MapReduce allows to effectively parallelize the processing of in-memory data stored in GGFS. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

Hadoop Accelerator provides three distinct advantages:

  • Minimal or no integration. GGFS is 100% HDFS-compatible and provides plug-n-play integration and works out-of-the-box with hundreds of projects in the Hadoop eco-system. In-memory MapReduce provides dramatic performance boosts for CPU-intensive tasks while requiring only minimal change to existing applications.
  • Works with any Hadoop distribution, open source or commercial – no need to roll out yet another proprietary distribution. You can continue to use your existing Apache, Pivotal, Intel, Cloudera, HortonWorks, MapR, AWS, or any other Hadoop 1.0 or Hadoop 2.0 (YARN) distributions.
  • Up to 100x performance increase for any existing IO, network or CPU intensive Hadoop MapReduce job and/or HDFS operation – delivering easy acceleration to existing Hadoop-based systems and products with minimal or no integration.

Key Features

Some of the key features of GridGain In-Memory Accelerator for Hadoop are:

No ETL Required

no_etl_128GridGain Hadoop Accelerator’s unique In-Memory File System allows it to work with data that is stored directly in Hadoop. Whether In-Memory File System is used in primary mode, or in secondary mode acting as an intelligent caching layer over the primary disk-based HDFS, it completely eliminates the time consuming and costly process of extracting, loading and transforming (ETL) data to and from Hadoop.

The ETL-free architecture of GridGain’s In-Memory Hadoop Accelerator enables companies to process live data in Hadoop without the need to offload it to other downstream system to gain the in-memory computing performance advantage. GridGain’s accelerator avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems.

Boost HDFS Performance

GridGain File System (GGFS) is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. Benchmarks below compare raw GGFS and HDFS performance against the same set of operations:

Benchmark GGFS, ms. HDFS, ms. Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.

Eliminate Hadoop MapReduce Overhead

For CPU-intensive and real-time use cases In-Memory Accelerator for Hadoop relies on in-memory MapReduce implementation that eliminates standard overhead associated with typical Hadoop’s job tracker polling, task tracker process creation, deployment and provisioning. GridGain’s in-memory MapReduce is a highly optimized HPC-based implementation of MapReduce concept enabling true low-latency data processing of data stored in GGFS.


Any Hadoop 1.x & 2.x Distributions

Hadoop Accelerator provides out-of-the-box support for both legacy Hadoop 1.x and new Hadoop 2.x (YARN) distributions. This allows organizations that utilize older Hadoop 1.x distributions to get full benefits of In-Memory Accelerator for Hadoop while preserving their investment when moving to the latest Hadoop 2.x distributions.

As mentioned above, Hadoop Accelerator isn’t another proprietary Hadoop distribution, rather, it’s a first-of-its-kind Hadoop plugin that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, and provides the same performance benefits whether you run Cloudera, HortonWorks, MapR, Apache, Intel, AWS, or any other distribution.

Speed Up Java/Scala/C/C++/Python MapReduce Jobs

Hadoop Accelerator architecture allows it to speed MapReduce jobs written in any Hadoop supported language and not only in native Java or Scala. Developers can easily reuse existing C/C++/Python or any other existing MapReduce code with In-Memory Accelerator for Hadoop to gain significant performance boost.

GUI Management

Hadoop Accelerator comes with a comprehensive GUI-based management and monitoring tool called GridGain Visor. It provides deep DevOps capabilities including an operations and telemetry dashboard, data grid and compute grid management, as well as GGFS management that provides GGFS monitoring and file management between HDFS, local and GGFS file systems.


Hadoop Accelerator also comes with a GUI-based file system profiler, which allows you to keep track of all operations your GGFS or HDFS file systems make and identifies potential hot spots. GGFS profiler tracks speed and throughput of reads, writes, and various directory operations for all files and displays these metrics in a convenient view which allows you to sort based on any profiled criteria, e.g. from slowest write to fastest.

Profiler also makes suggestions whenever it is possible to gain performance by loading file data into in-memory GGFS.



Hadoop is registered trademarks of Apache Software Foundation and/or its affiliates. Other names may be trademarks of their respective owners.

If you have any questions please contact us at