In-Memory Accelerator for Apache Hadoop®
Your Hadoop – 100x Faster
Hadoop Accelerator key innovations:
GridGain’s In-Memory Accelerator for Hadoop is based on the industry’s first dual-mode, high-performance in-memory file system that is 100% compatible with Hadoop HDFS – and an in-memory optimized MapReduce implementation. In-memory HDFS and in-memory MapReduce provide easy to use extension to disk-based HDFS and traditional MapReduce delivering up to 100x faster performance.
In-Memory Accelerator for Hadoop requires minimal or no integration and works with any commercial or open source version of Hadoop available including Cloudera, HortonWorks, MapR, Apache, Intel, AWS, as well as any other Hadoop 1.x and Hadoop 2.x distributions.
Hadoop Accelerator enhances existing Hadoop technology to enable fast data processing using the tools and technology your organization is already using today. It was designed to eliminate the trade-offs when adding real time capabilities to existing Hadoop systems.
GridGain’s in-memory file system (GGFS) supports dual-mode that allows it to work as either a standalone primary file system in the Hadoop cluster, or in tandem with HDFS, serving as an intelligent caching layer with HDFS configured as the primary file system.
As a caching layer it provides highly tunable read-through and write-through logic and users can freely select which files or directories to be cached and how. In either case GGFS can be used as a drop-in alternative for, or an extension of, standard HDFS providing an instant performance increase.
GridGain’s in-memory MapReduce allows to effectively parallelize the processing of in-memory data stored in GGFS. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.
Hadoop Accelerator provides three distinct advantages:
- Minimal or no integration. GGFS is 100% HDFS-compatible and provides plug-n-play integration and works out-of-the-box with hundreds of projects in the Hadoop eco-system. In-memory MapReduce provides dramatic performance boosts for CPU-intensive tasks while requiring only minimal change to existing applications.
- Works with any Hadoop distribution, open source or commercial – no need to roll out yet another proprietary distribution. You can continue to use your existing Apache, Pivotal, Intel, Cloudera, HortonWorks, MapR, AWS, or any other Hadoop 1.0 or Hadoop 2.0 (YARN) distributions.
- Up to 100x performance increase for any existing IO, network or CPU intensive Hadoop MapReduce job and/or HDFS operation – delivering easy acceleration to existing Hadoop-based systems and products with minimal or no integration.
Some of the key features of GridGain In-Memory Accelerator for Hadoop are:
|Benchmark||GGFS, ms.||HDFS, ms.||Boost, %|
|File Random Access||413||2931||710%|
The above tests were performed on a 10-node cluster of Dell R610 blades with Dual 8-core CPUs, running Ubuntu 12.4 OS, 10GBE network fabric and stock unmodified Apache Hadoop 2.x distribution.
For CPU-intensive and real-time use cases In-Memory Accelerator for Hadoop relies on in-memory MapReduce implementation that eliminates standard overhead associated with typical Hadoop’s job tracker polling, task tracker process creation, deployment and provisioning. GridGain’s in-memory MapReduce is a highly optimized HPC-based implementation of MapReduce concept enabling true low-latency data processing of data stored in GGFS.
Hadoop Accelerator provides out-of-the-box support for both legacy Hadoop 1.x and new Hadoop 2.x (YARN) distributions. This allows organizations that utilize older Hadoop 1.x distributions to get full benefits of In-Memory Accelerator for Hadoop while preserving their investment when moving to the latest Hadoop 2.x distributions.
As mentioned above, Hadoop Accelerator isn’t another proprietary Hadoop distribution, rather, it’s a first-of-its-kind Hadoop plugin that works with your choice of Hadoop distribution, which can be any commercial or open source version of Hadoop available, and provides the same performance benefits whether you run Cloudera, HortonWorks, MapR, Apache, Intel, AWS, or any other distribution.
Hadoop Accelerator architecture allows it to speed MapReduce jobs written in any Hadoop supported language and not only in native Java or Scala. Developers can easily reuse existing C/C++/Python or any other existing MapReduce code with In-Memory Accelerator for Hadoop to gain significant performance boost.
Hadoop Accelerator comes with a comprehensive GUI-based management and monitoring tool called GridGain Visor. It provides deep DevOps capabilities including an operations and telemetry dashboard, data grid and compute grid management, as well as GGFS management that provides GGFS monitoring and file management between HDFS, local and GGFS file systems.
Hadoop Accelerator also comes with a GUI-based file system profiler, which allows you to keep track of all operations your GGFS or HDFS file systems make and identifies potential hot spots. GGFS profiler tracks speed and throughput of reads, writes, and various directory operations for all files and displays these metrics in a convenient view which allows you to sort based on any profiled criteria, e.g. from slowest write to fastest.
Profiler also makes suggestions whenever it is possible to gain performance by loading file data into in-memory GGFS.
TRADEMARK AND COPYRIGHT NOTE:
Hadoop is registered trademarks of Apache Software Foundation and/or its affiliates. Other names may be trademarks of their respective owners.
If you have any questions please contact us at firstname.lastname@example.org