Apache Ignite and GridGain: In-Memory Data Management and Acceleration for Apache Spark


Apache Spark is a leading open source fast and general-purpose engine for large-scale data processing of streaming data. Part of its stellar rise has been its adoption as the de-facto processing engine for Apache Hadoop™. But while Spark supplanted MapReduce for processing, no solution for real-time data management has emerged. Spark doesn’t have features for managing data or processing state. As a result, developers using Spark often write extensive code to ingest, prepare and enrich, store and manage data and state.

Apache Ignite™ and GridGain® provide the most extensive in-memory data management and acceleration for Spark. Ignite is an open source in-memory computing platform that provides an in-memory data grid (IMDG), in-memory database (IMDB) and support for streaming analytics, machine and deep learning. GridGain® is the leading in-memory computing platform for real-time business and the only enterprise-grade, commercially supported version of Ignite.

GridGain and Ignite provide the ideal underlying in-memory data management technology for Apache Spark because of its in-memory support for both stored “data at rest” and streaming “data in motion.” Learn how this makes many Spark tasks simple, including stream ingestion, data preparation and storage, stream processing, state management, streaming analytics, and machine and deep learning.