GridGain's booth is seeing a lot of traffic this week at the Spark+AI Summit 2019, which runs April 23-25 in San Francisco. Our experts are fielding many questions from attendees about Apache Spark, GridGain and Apache® Ignite™.
If you're at the conference, be sure to catch GridGain's session at 5:30 p.m. in room 2010. Alexey Zinovyev's talk is titled, "Distributed ML/DL with Ignite ML module using Apache Spark as database."
"The current implementation of Machine Learning (ML) algorithms in Spark have several disadvantages associated with the transition from standard Spark SQL types to ML-specific types," Alexey explained when asked about his upcoming talk.
"Also, Spark ML doesn't support online-learning by nature for all algorithms, stacking, boosting and a bunch of approximate ML algorithms that gives a significant speedup in many cases."
Apache Ignite works closely with Apache Spark due to excellent Ignite RDD/Ignite DataFrame implementation.
"Apache Ignite has the Ignite ML module that includes a lot of distributed ML algorithms, NLP package (will be available in next release, 2.8), the bunch of approximate ML algorithms, simple integration with TensorFlow via TensorFlow Ignite Dataset (currently, this is a part of TF.contrib package) and also each algorithm supports the model updating that gives us ability to make online-learning not only for KMeans and LinReg," he added.
"We suggest using the Apache Ignite ML module to speedup your ML training and use Spark + Ignite as backend for distributed TensorFlow calculations. You will see live demos of ML pipeline building with Apache Ignite ML module, Apache Spark, Apache Kafka, TensorFlow and more."