Streaming Analytics Using GridGain

GridGain® is used for streaming analytics to ingest, process, store and publish streaming data for large-scale, mission critical business applications. Companies have been able to ingest and process streams with millions of events per second on a moderately-sized cluster.

GridGain integrate and is used with major streaming technologies including Apache Camel, Apache Kafka, Apache Spark® and Apache Storm, Java Message Service (JMS) and MQTT to ingest, process and publish streaming data. Once the streaming data is loaded into the GridGain cluster, companies can leverage GridGain’s built-in massively parallel processing libraries for concurrent data processing, including concurrent SQL queries and machine and deep learning. Clients can then subscribe to continuous queries which execute and identify important events as streams are processed.

Streaming Analytics

Apache Spark Native Integration

GridGain also provides the broadest integration with Apache Spark of any in-memory computing platform for streaming analytics. Native support for Spark DataFrames, a GridGain RDD API for reading in and writing data to GridGain as mutable Spark RDDs, optimized SQL, and an in-memory implementation of HDFS with the GridGain File System (GGFS) are included in the integration. When deployed together, Spark can:

  • Access all of the GridGain in-memory data, not just data streams
  • Share data and state across all Spark jobs
  • Take advantage of all of the GridGain in-memory loading and processing capabilities including machine and deep learning to train models in real-time to improve outcomes for in-process hybrid transactional/analytical processing (HTAP) applications