Products

In-Memory Accelerator for Apache Hadoop®

10x Faster – Plug-n-Play

Zero Code Change
Boost HDFS Performance
Eliminate MapReduce Overhead

Hadoop Accelerator key features:

  • GridGain In-Memory File System
  • GridGain In-Memory MapReduce

GridGain’s In-Memory Accelerator for Hadoop is based on the industry’s first dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS – and a plug-n-play MapReduce implementation optimized for in-memory processing. In-memory HDFS and in-memory MapReduce provide an easy to use extension to disk-based HDFS and traditional MapReduce, delivering up to 10x faster performance.

In-Memory Accelerator for Hadoop requires zero code change and works with any commercial or open source version of Hadoop 2.x available including Cloudera, HortonWorks, MapR, Apache, Intel, and AWS.

Technology

In-Memory Accelerator for Hadoop enhances existing Hadoop technology to enable fast data processing using the tools and technology your organization is already using today. It was designed to eliminate the trade-offs when adding real time capabilities to existing Hadoop systems.

GridGain’s in-memory file system (GGFS) supports a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster, or in tandem with HDFS, serving as an intelligent caching layer with HDFS configured as the primary file system.

As a caching layer it provides highly tunable read-through and write-through logic and users can freely select which files or directories to be cached and how. In either case GGFS can be used as a drop-in alternative for, or an extension of, standard HDFS providing an instant performance increase.

GridGain’s in-memory MapReduce allows users to effectively parallelize the processing of in-memory data stored in GGFS. It requires zero code change to the users’ Hadoop MapReduce code, and eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

In-Memory Accelerator for Hadoop provides three distinct advantages:

  • Zero integration cost. GGFS is 100% HDFS-compatible and provides plug-n-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce requires zero code change for user MapReduce logic and provides dramatic performance boosts for CPU-intensive tasks.
  • Works with any Hadoop 2.x distribution, open source or commercial – no need to roll out yet another proprietary distribution. You can continue to use your existing Apache, Pivotal, Intel, Cloudera, HortonWorks, MapR, or AWS.
  • Up to 10x performance increase for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation, delivering instant acceleration to existing Hadoop-based systems and products.

Key Features

Some of the key features of GridGain In-Memory Accelerator for Hadoop are:

No ETL Required

no_etl_128GridGain In-Memory Accelerator for Hadoop’s unique In-Memory File System allows it to work with data that is stored directly in Hadoop. Whether In-Memory File System is used in primary mode, or in secondary mode acting as an intelligent caching layer over the primary disk-based HDFS, it completely eliminates the time consuming and costly process of extracting, loading and transforming (ETL) data to and from Hadoop.

The ETL-free architecture of GridGain’s In-Memory Hadoop Accelerator enables companies to process live data in Hadoop without the need to offload it to other downstream system to gain the in-memory computing performance advantage. GridGain’s In-Memory Accelerator for Hadoop avoids duplication of data and eliminates unnecessary data movement that typically clogs the network and I/O subsystems.

Boost HDFS Performance

GridGain File System (GGFS) is 100% compatible with HDFS and can act as a drop-in replacement or extension for HDFS. Benchmarks below compare raw GGFS and HDFS performance against the same set of operations:

Benchmark GGFS, ms. HDFS, ms. Boost, %
File Scan 27 667 2470%
File Create 96 961 1001%
File Random Access 413 2931 710%
File Delete 185 1234 667%

The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GbE network fabric and stock unmodified Apache Hadoop 2.x distribution.


Eliminate Hadoop MapReduce Overhead

For CPU-intensive and real-time use cases In-Memory Accelerator for Hadoop relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop’s job tracker polling, task tracker process creation, deployment and provisioning. GridGain’s in-memory MapReduce requires no code change to the user’s MapReduce logic and is based on a highly optimized HPC-based implementation of MapReduce enabling true low-latency data processing of data stored in GGFS.

gg_hadoop_grey_1600


Any Hadoop 2.x (YARN) Distributions

In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) distributions. As mentioned above, In-Memory Accelerator for Hadoop isn’t another proprietary Hadoop distribution, rather, it’s a first-of-its-kind Hadoop plugin that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, and provides the same performance benefits whether you run Cloudera, HortonWorks, MapR, Apache, Intel, AWS, or any other distribution.


Speed Up Java/Scala/C/C++/Python MapReduce Jobs

Hadoop Accelerator architecture allows it to speed MapReduce jobs written in any Hadoop supported language and not only in native Java or Scala. Developers can easily reuse existing C/C++/Python or any other existing MapReduce code to gain significant performance boost.


GUI Management

In-Memory Accelerator for Hadoop comes with a comprehensive GUI-based management and monitoring tool called GridGain Visor. It provides deep DevOps capabilities including an operations and telemetry dashboard, data grid and compute grid management, as well as GGFS management that provides GGFS monitoring and file management between HDFS, local and GGFS file systems.

visor_fm2

In-Memory Accelerator for Hadoop also comes with a GUI-based file system profiler, which allows you to keep track of all operations your GGFS or HDFS file systems make and identifies potential hot spots. GGFS profiler tracks speed and throughput of reads, writes, and various directory operations for all files and displays these metrics in a convenient view which allows you to sort based on any profiled criteria, e.g. from slowest write to fastest.

Profiler also makes suggestions whenever it is possible to gain performance by loading file data into in-memory GGFS.

visor_profiler


TRADEMARK AND COPYRIGHT NOTE:

Hadoop is registered trademarks of Apache Software Foundation and/or its affiliates. Other names may be trademarks of their respective owners.

If you have any questions please contact us at sales@gridgain.com