In-Memory Accelerator for Apache Hadoop®
10x Faster – Plug-n-Play
Hadoop Accelerator key features:
GridGain’s In-Memory Accelerator for Hadoop is based on the industry’s first dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS – and a plug-n-play MapReduce implementation optimized for in-memory processing. In-memory HDFS and in-memory MapReduce provide an easy to use extension to disk-based HDFS and traditional MapReduce, delivering up to 10x faster performance.
In-Memory Accelerator for Hadoop requires zero code change and works with any commercial or open source version of Hadoop 2.x available including Cloudera, HortonWorks, MapR, Apache, Intel, and AWS.
In-Memory Accelerator for Hadoop enhances existing Hadoop technology to enable fast data processing using the tools and technology your organization is already using today. It was designed to eliminate the trade-offs when adding real time capabilities to existing Hadoop systems.
GridGain’s in-memory file system (GGFS) supports a dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster, or in tandem with HDFS, serving as an intelligent caching layer with HDFS configured as the primary file system.
As a caching layer it provides highly tunable read-through and write-through logic and users can freely select which files or directories to be cached and how. In either case GGFS can be used as a drop-in alternative for, or an extension of, standard HDFS providing an instant performance increase.
GridGain’s in-memory MapReduce allows users to effectively parallelize the processing of in-memory data stored in GGFS. It requires zero code change to the users’ Hadoop MapReduce code, and eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.
In-Memory Accelerator for Hadoop provides three distinct advantages:
- Zero integration cost. GGFS is 100% HDFS-compatible and provides plug-n-play integration that works out-of-the-box with hundreds of projects in the Hadoop ecosystem. In-memory MapReduce requires zero code change for user MapReduce logic and provides dramatic performance boosts for CPU-intensive tasks.
- Works with any Hadoop 2.x distribution, open source or commercial – no need to roll out yet another proprietary distribution. You can continue to use your existing Apache, Pivotal, Intel, Cloudera, HortonWorks, MapR, or AWS.
- Up to 10x performance increase for any existing IO, network or CPU-intensive Hadoop MapReduce job and/or HDFS operation, delivering instant acceleration to existing Hadoop-based systems and products.
Some of the key features of GridGain In-Memory Accelerator for Hadoop are:
|Benchmark||GGFS, ms.||HDFS, ms.||Boost, %|
|File Random Access||413||2931||710%|
The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GbE network fabric and stock unmodified Apache Hadoop 2.x distribution.
For CPU-intensive and real-time use cases In-Memory Accelerator for Hadoop relies on an in-memory optimized YARN-based MapReduce implementation that eliminates standard overhead associated with typical Hadoop’s job tracker polling, task tracker process creation, deployment and provisioning. GridGain’s in-memory MapReduce requires no code change to the user’s MapReduce logic and is based on a highly optimized HPC-based implementation of MapReduce enabling true low-latency data processing of data stored in GGFS.
In-Memory Accelerator for Hadoop provides out-of-the-box support for Hadoop 2.x (YARN) distributions. As mentioned above, In-Memory Accelerator for Hadoop isn’t another proprietary Hadoop distribution, rather, it’s a first-of-its-kind Hadoop plugin that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, and provides the same performance benefits whether you run Cloudera, HortonWorks, MapR, Apache, Intel, AWS, or any other distribution.
Hadoop Accelerator architecture allows it to speed MapReduce jobs written in any Hadoop supported language and not only in native Java or Scala. Developers can easily reuse existing C/C++/Python or any other existing MapReduce code to gain significant performance boost.
In-Memory Accelerator for Hadoop comes with a comprehensive GUI-based management and monitoring tool called GridGain Visor. It provides deep DevOps capabilities including an operations and telemetry dashboard, data grid and compute grid management, as well as GGFS management that provides GGFS monitoring and file management between HDFS, local and GGFS file systems.
In-Memory Accelerator for Hadoop also comes with a GUI-based file system profiler, which allows you to keep track of all operations your GGFS or HDFS file systems make and identifies potential hot spots. GGFS profiler tracks speed and throughput of reads, writes, and various directory operations for all files and displays these metrics in a convenient view which allows you to sort based on any profiled criteria, e.g. from slowest write to fastest.
Profiler also makes suggestions whenever it is possible to gain performance by loading file data into in-memory GGFS.
TRADEMARK AND COPYRIGHT NOTE:
Hadoop is registered trademarks of Apache Software Foundation and/or its affiliates. Other names may be trademarks of their respective owners.
If you have any questions please contact us at firstname.lastname@example.org