GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

SVM Binary Classification

Support Vector Machines (SVMs) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.

Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.

Only Linear SVM is supported in the Apache Ignite Machine Learning module. For more information look at SVM in Wikipedia.

Model

A Model in the case of SVM is represented by the class SVMLinearBinaryClassificationModel. It enables a prediction to be made for a given vector of features, in the following way:

SVMLinearBinaryClassificationModel model = ...;

double prediction = model.predict(observation);

Presently Ignite supports the following parameters for SVMLinearBinaryClassificationModel:

  • isKeepingRawLabels - controls the output label format: -1 and +1 for false value and raw distances from the separating hyperplane (default value: false)

  • threshold - a threshold to assign +1 label to the observation if the raw value is more than this threshold (default value: 0.0)

SVMLinearBinaryClassificationModel model = ...;

double prediction = model
  .withRawLabels(true)
  .withThreshold(5)
  .predict(observation);

Trainer

Base class for a soft-margin SVM linear classification trainer based on the communication-efficient distributed dual coordinate ascent algorithm (CoCoA) with hinge-loss function. This trainer takes input as a Labeled Dataset with -1 and +1 labels for two classes and makes binary classification.

The paper about this algorithm could be found here https://arxiv.org/abs/1409.1458.

Presently, Ignite supports the following parameters for SVMLinearBinaryClassificationTrainer:

  • amountOfIterations - amount of outer SDCA algorithm iterations. (default value: 200)

  • amountOfLocIterations - amount of local SDCA algorithm iterations. (default value: 100)

  • lambda - regularization parameter (default value: 0.4)

// Set up the trainer
SVMLinearBinaryClassificationTrainer trainer = new SVMLinearBinaryClassificationTrainer()
  .withAmountOfIterations(AMOUNT_OF_ITERATIONS)
  .withAmountOfLocIterations(AMOUNT_OF_LOC_ITERATIONS)
  .withLambda(LAMBDA);

// Build the model
SVMLinearBinaryClassificationModel mdl = trainer.fit(
  datasetBuilder,
  featureExtractor,
  labelExtractor
);

Example

To see how SVM Linear Classifier can be used in practice, try this example, available on GitHub and delivered with every Apache Ignite distribution.

The training dataset is the subset of the Iris dataset (classes with labels 1 and 2, which are presented linear separable two-classes dataset) which could be loaded from the UCI Machine Learning Repository.