How to Use Spark With Apache Ignite for Big Data Processing

Apache Ignite and Spark are complementary in-memory computing solutions that can be used together to achieve superior performance and functionality to process SQL data.

When Ignite is used as a data source for Spark, Ignite provides mechanisms, such as SQL indexes and data co-location across tables, that optimize SQL for Spark. Also, although the order of the inserted data is important to many data sources, it is not important to Ignite. Therefore, Spark SQL queries that use Ignite as a data source run much faster than Spark SQL queries that use other data sources, such as HADOOP.

In this webinar, you learn when to use Ignite, and, using real-world examples, you learn how to configure Spark-Ignite integration.. In addition, you learn about optimization techniques and best practices. Topics include the following:

  • Methods for loading data from Spark to Ignite
  • Methods for reading data from Ignite via Spark
  • Methods for working with Ignite data frames
  • The benchmark results for using Ignite-Spark integration to load and query a large amount of data
  • Tips for debugging Ignite-Spark integrations
Andrey Alexandrov
Andrey Alexandrov
Senior Software Engineer at GridGain