Continuous Learning Framework

The GridGain Continuous Learning Framework supports machine and deep learning model training on your operational data. Built-in machine learning libraries allow you to train your models at any time without moving data to a separate system, enabling real-time updates to your models. A native integration with TensorFlow allows you to easily move your data into TensorFlow for model training or even leverage your GridGain cluster for deep learning model training.

Machine Learning

GridGain® Machine Learning (ML) is a set of simple, scalable and efficient tools which allow users to build predictive machine learning models without costly data transfers. With machine learning capabilities integrated into GridGain, data scientists can avoid two major factors that keep ML from mainstream adoption.

Problem One: Constant Data Movement (ETL)

Machine learning models are trained in one system and then the machine learning model is deployed (after the training is over) in a different system. The data scientists have to wait for ETL or some other data transfer process to move the data into a system like Apache Mahout or Apache Spark for model training. Then they have to wait while this process completes and then they must redeploy the updated model in a production environment. This whole process can take hours while moving terabytes of data from one system to another. In addition, the model training is done on an old data set due to the delays introduced by the ETL process.

Machine Learning

Problem Two: Lack of Horizontal Scalability

Machine learning and deep learning (DL) algorithms may have to process data sets which no longer fit within a single server unit. This issue requires that data scientists to come up with sophisticated solutions o​r turn to distributed computing platforms such as Apache Spark and TensorFlow to manage these large data sets. However, the platforms they create mostly solve only a part of the puzzle, the model training. The developers must then decide how to deploy the models in production later.

Zero ETL and Massive Scalability

GridGain Machine Learning relies on Apache Ignite's memory-centric storage that provides massive scalability for ML and DL tasks. It also eliminates the wait imposed by ETL between the different systems. For instance, it allows users to run ML/DL training and inference directly on data stored across memory and disk in a GridGain cluster. GridGain provides a host of ML and DL algorithms that are optimized for GridGain collocated distributed processing. These implementations deliver in-memory speed and unlimited horizontal scalability when running in place against massive data sets or incrementally against incoming data streams, without requiring the data to be moved into another store. By eliminating data movement and the long processing wait times, GridGain ML enables continuous learning that can improve decisions based on the latest data as it arrives, in real-time.

Fault Tolerance and Continuous Learning

GridGain Machine Learning is tolerant to node failures. In the case of a node failure during the learning process, all recovery procedures will be transparent to the user, learning processes will not be interrupted and results will be available in nearly the same amount of time as when all nodes work without any issuesj.