Enabling Real-Time Analytics for Hadoop Data Lakes with GridGain


This webinar discusses how an in-memory computing platform such as GridGain or Apache Ignite can modernize existing data lake architectures, enabling real-time analytics that spans operational, historical, and streaming data sets.

Data lakes, such as those powered by Hadoop, are an excellent choice for analytics and reporting at scale. Hadoop scales horizontally and cost-effectively and fulfills long-running operations spanning big data sets. However, the continual growth of real-time analytics requirements — where operations need to be completed in seconds rather than minutes, or milliseconds rather than seconds — has brought new challenges to Hadoop based solutions.

In this session, Denis Magda, GridGain VP of Product and Apache Ignite PMC Chair, describes:

  • How to choose the right deployment mode and responsibilities when working with GridGain and Hadoop
  • How to determine which operations should be handled by GridGain and which should be sent to Hadoop
  • How to use Spark DataFrames to run federated (aka cross-database) queries that span GridGain and Hadoop
  • How to perform initial data loading from Hadoop to GridGain
  • How to set up bi-directional synchronization between Hadoop and GridGain
Denis Magda
Former VP, Developer Relations in R&D at GridGain; Apache Ignite committer and PMC member