GridGain is a Silver Sponsor of next week's Spark+AI Summit 2019 in San Francisco and our
own Alexey Zinovyev will deliver a session titled, "Distributed ML/DL with
Ignite ML module using Apache Spark as database." The conference runs from April 23-25.
About his talk: The current implementation of Machine Learning (ML) algorithms in Spark has several disadvantages associated with the transition from standard Spark SQL types to ML-specific types, a low level of algorithms' adaptation to distributed computing, a relatively slow speed of adding new algorithms to the current library.
Also, Spark ML doesn't support online-learning by nature for all algorithms, stacking, boosting and a bunch of approximate ML algorithms that gives a significant speedup in many cases.
Apache® Ignite™ works closely with Apache Spark due to excellent Ignite RDD/Ignite DataFrame implementation (see https://ignite.apache.org/use-cases/spark/shared-memory-layer.html). Also Apache Ignite has the Ignite ML module that includes a lot of distributed ML algorithms, NLP package (will be available in next release, 2.8), the bunch of approximate ML algorithms, simple integration with TensorFlow via TensorFlow Ignite Dataset (currently, this is a part of TF.contrib package) and also each algorithm supports the model updating that gives us ability to make online-learning not only for KMeans and LinReg.
We suggest using the Apache Ignite ML module to speedup your ML training and use Spark + Ignite as backend for distributed TensorFlow calculations. You will see live demos of ML pipeline building with Apache Ignite ML module, Apache Spark, Apache Kafka, TensorFlow and more.
Follow the conference buzz on Twitter @SparkAISummit.
About
Alexey
Just as Charon from the Greek myths, Alexey helps people to get from one side to the other, the
sides being Java and Big Data in his case. Or, in more simple words, he is a Java/BigData trainer. He works with
Hadoop/Spark and other Big Data projects since 2012, forks such projects and sends pull requests since 2014,
presents talks since 2015. His favorite areas are text data and large graphs. Also, Alexey is a contributor of
Ignite ML, he wrote manually SVM, KNN, Logistic Regression, a lot of preprocessing staff and author of the official
Ignite ML tutorial.