GridGain ML Model Deployment and Running Predictions
Model Deployment
Preparing the Model
You can use models from the following sources:
-
Pretrained models from DJL Model Zoo.
-
Locally trained models.
-
Pre-trained models in supported formats with correct model names.
DJL Model Zoo
DJL Model Zoo models come in supported formats pre-configured with serving.properties that have the modelName preconfigured. You can download them easily using a program like the one in the quickstart guide.
Other Sources
The system automatically searches for model files named model.* (e.g., model.onnx, model.pt). You can override this default behavior by adding .name("your-model-name") to your job parameters.
Supported Model Formats
| Framework | Supported Formats | Required Files |
|---|---|---|
PyTorch |
.pt |
Model file + configuration files. |
TensorFlow |
.pb |
Model file + configuration files. |
ONNX |
.onnx |
Model file + configuration files. |
Unsupported Formats
-
SafeTensors (.safetensors): Not supported by DJL. Must convert to PyTorch format.
-
Pytorch Pickled (.pth): Not supported by DJL. Must convert to TorchScript (.pt) format.
-
Pickle files (.pkl): Raw pickle files need proper model wrapper.
Deploying the Model
Models are deployed as deployment units using the CLI or REST:
# Deploy model to cluster
cluster unit deploy sentiment-model \
--version 1.0.0 \
--path /path/to/model/directory \
--nodes ALL
For more information refer to code deployment documentation.
Automated Deployment
GridGain ML provides utility methods to automate model download and deployment for different scenarios.
Standard Models with Custom Classes
For models with custom input/output types, translators, or marshallers:
List<String> urls = Arrays.asList(
"https://huggingface.co/model-repo/resolve/main/model.onnx",
"https://huggingface.co/model-repo/resolve/main/tokenizer.json"
);
Path tempModelDir = ModelUtils.downloadAndDeployModelAndJar(
urls, // URLs to download model files from
"model-id", // Model identifier
"1.0.0", // Model version
sourceDir // Path to custom classes for JAR packaging
);
This method automatically:
-
Downloads all required model files from the specified URLs
-
Creates a temporary directory for the model files
-
Packages custom classes (translators, marshallers) into a JAR
-
Deploys both the model and custom classes to the GridGain cluster
DJL Model Zoo Models
For models from the DJL Model Zoo using djl:// protocol:
String modelUrl = "djl://ai.djl.huggingface.pytorch/deepset/minilm-uncased-squad2";
Path tempModelDir = ModelUtils.downloadAndDeployDJLModelAndJar(
modelUrl, // DJL model URL
PytorchQAInput.class, // Input type class
String.class, // Output type class
"pytorch-qa-model", // Model identifier
"1.0.0", // Model version
sourceDir, // Path to custom classes
PytorchQATranslatorFactory.class.getName() // Translator factory
);
This method automatically:
-
Downloads the PyTorch model from DJL model zoo
-
Compiles custom classes from the source directory
-
Generates a JAR containing the custom compiled classes
-
Deploys both the model files and custom classes JAR to the GridGain cluster
TensorFlow Models with Folder Structure
For TensorFlow SavedModel format that requires specific folder structure:
List<String> urls = Arrays.asList(
"https://huggingface.co/model-repo/resolve/main/saved_model.pb",
"https://huggingface.co/model-repo/resolve/main/variables/variables.data-00000-of-00001",
"https://huggingface.co/model-repo/resolve/main/variables/variables.index"
);
Path tempModelDir = ModelUtils.downloadAndDeployModelAndJarWithFolderStructure(
urls, // URLs to download model files from
"tf-model-id", // Model identifier
"1.0.0", // Model version
sourceDir // Path to custom classes for JAR packaging
);
This method automatically:
-
Downloads all required TensorFlow model files
-
Creates the proper folder structure (
saved_model.pbin root, variables invariables/subdirectory) -
Packages custom classes into a JAR
-
Deploys both the model and custom classes to the GridGain cluster
Model Properties Reference
Core Properties
Some examples of common model properties that could optionally be used in job parameters:
.property("application", "ai.djl.Application$NLP$SENTIMENT_ANALYSIS")
| Property | Description |
|---|---|
application |
DJL application type |
groupId |
Model zoo group ID |
artifactId |
Model zoo artifact ID |
Translator Arguments
Properties without the option. prefixes are passed to the TranslatorFactory.
-
For built-in translators you will need to check their source code.
-
For custom translators, you can pass ANY properties (without the
option.prefix) and they will be available in your TranslatorFactory’snewInstance()method via the arguments' parameter.
Engine Options
Properties that do not match the Core Properties and do not have the option. prefix are passed as engine options.
Examples:
-
option.mapLocation- (PyTorch) Map to CPU before loading. -
option.ortDevice- (ONNX) Device selection. -
option.Tags- (TensorFlow) Model tags.
Running Predictions
Simple Predictions
Execute inference with a single input.
String input = "This movie is absolutely fantastic! I loved every minute of it.";
// Create job parameters
MlSimpleJobParameters<String> jobParams =
MlSimpleJobParameters.<String>builder()
.id("sentiment-model")
.version("1.0.0")
.type(ModelType.PYTORCH)
.name("traced_distilbert_sst_english")
.config(ModelConfig.builder().build())
.inputClass(String.class.getName())
.outputClass(Classifications.class.getName())
.translatorFactory("ai.djl.pytorch.zoo.nlp.sentimentanalysis.PtDistilBertTranslatorFactory")
.input(input)
.build();
// Execute prediction
Classifications result = ml.predict(jobParams);
System.out.println("Prediction: " + result.best().getClassName() +
" (confidence: " + result.best().getProbability() + ")");
Batch Predictions
Execute inference on multiple inputs in a batch.
List<String> batchInputs = Arrays.asList(
"This movie is very good",
"This book is not good",
"This food is very good"
);
MlBatchJobParameters<String> jobParams = MlBatchJobParameters.<String>builder()
.id("sentiment-model")
.version("1.0.0")
.type(ModelType.PYTORCH)
.name("traced_distilbert_sst_english")
.config(ModelConfig.builder().batchSize(16).build())
.inputClass("java.lang.String")
.outputClass("ai.djl.modality.Classifications")
.translatorFactory("ai.djl.pytorch.zoo.nlp.sentimentanalysis.PtDistilBertTranslatorFactory")
.batchInput(batchInputs)
.build();
// Execute batch prediction
List<Classifications> results = ml.batchPredict(jobParams);
// Process results
for (int i = 0; i < results.size(); i++) {
Classifications classification = results.get(i);
System.out.println("Input: " + batchInputs.get(i));
System.out.println("Result: " + classification.best());
}
SQL-Based Predictions
Execute inference using SQL queries on database tables:
// First, set up sample data
sql.execute(null,
"CREATE TABLE IF NOT EXISTS product_reviews (" +
"review_id INT PRIMARY KEY, " +
"product_category VARCHAR(100), " +
"review_text VARCHAR(1000), " +
"sentiment VARCHAR(20)" +
")");
sql.execute(null,
"INSERT INTO product_reviews VALUES " +
"(1, 'Electronics', 'This smartphone is amazing!', null), " +
"(2, 'Books', 'Boring story, could not finish reading it.', null)");
// Execute SQL-based prediction
String sqlQuery = "SELECT review_text FROM product_reviews WHERE sentiment IS NULL";
MlSqlJobParameters jobParams = MlSqlJobParameters.builder()
.id("sentiment-model")
.version("1.0.0")
.type(ModelType.PYTORCH)
.name("traced_distilbert_sst_english")
.config(ModelConfig.builder().build())
.inputClass("java.lang.String")
.outputClass("ai.djl.modality.Classifications")
.translatorFactory("ai.djl.pytorch.zoo.nlp.sentimentanalysis.PtDistilBertTranslatorFactory")
.sqlQuery(sqlQuery)
.inputColumn("review_text")
.build();
List<Classifications> results = ml.predictFromSql(jobParams);
Parameterized Queries
Execute inference using parameterized SQL queries:
String parameterizedQuery = "SELECT review_text FROM product_reviews WHERE product_category = ? AND sentiment IS NULL LIMIT ?";
Serializable[] params = {"Electronics", 10};
MlSqlJobParameters jobParams = MlSqlJobParameters.builder()
// ... other parameters ...
.sqlQuery(parameterizedQuery)
.sqlParams(params)
.limit(10) // Additional limit
.build();
List<Classifications> results = ml.predictFromSql(jobParams);
Colocated Predictions
Execute inference on data stored locally on specific cluster nodes.
MlColocatedJobParameters<String> jobParams =
MlColocatedJobParameters.<String>builder()
.id("sentiment-model")
.version("1.0.0")
.type(ModelType.PYTORCH)
.name("traced_distilbert_sst_english")
.config(ModelConfig.builder().build())
.inputClass("java.lang.String")
.outputClass("ai.djl.modality.Classifications")
.property("application", "ai.djl.Application$NLP$SENTIMENT_ANALYSIS")
.translatorFactory("ai.djl.pytorch.zoo.nlp.sentimentanalysis.PtDistilBertTranslatorFactory")
.tableName("product_reviews")
.key(Tuple.create().set("review_id", 1))
.inputColumn("review_text")
.build();
Classifications result = ml.predictColocated(jobParams);
Asynchronous Operations
All prediction methods support asynchronous execution:
// Async single prediction
CompletableFuture<Classifications> futureResult = ml.predictAsync(jobParams);
// Async batch prediction
CompletableFuture<List<Classifications>> futureBatchResults =
ml.batchPredictAsync(batchJobParams);
// Async SQL prediction
CompletableFuture<List<Classifications>> futureSqlResults =
ml.predictFromSqlAsync(sqlJobParams);
// Async colocated prediction
CompletableFuture<Classifications> futureColocatedResult =
ml.predictColocatedAsync(colocatedJobParams);
Execution Modes
GridGain ML examples can run in different modes depending on your deployment scenario. The examples use a base class (MlBaseExample) that supports two execution modes:
All four examples supports both execution modes, we can select either MODE.EMBEDDED or MODE.CLIENT_ONLY
Available Modes
Embedded Mode (MODE.EMBEDDED)
Runs the ML inference directly within the GridGain server process.
Characteristics:
-
Single JVM process
-
No separate client connection needed
-
Lowest latency for inference operations
-
Ideal for testing and development
Setup:
start(MODE.EMBEDDED);
mlApi = server.api().ml(); // Direct access to ML API
Client-Only Mode (MODE.CLIENT_ONLY)
Connects to an existing GridGain cluster as a thin client.
Characteristics:
-
Requires a running GridGain cluster
-
Separate client process connects to the cluster
-
Production-ready deployment pattern
-
Network overhead for operations
Setup:
start(MODE.CLIENT_ONLY);
mlApi = client.ml(); // ML API accessed via client connection
Advanced Usage
When to Use Standard vs Custom Approaches
Use Standard Approach When:
-
Standard input/output types: Working with primitives or built-in DJL types (e.g.,
String,Classifications) -
Built-in DJL translator functionality: The translator provided by DJL meets your pre/post data processing needs
-
Simple data flows: No complex pre/post-processing workflows required
This approach is shown in the basic prediction examples above.
Use Custom Compute Jobs When:
-
Custom input/output types: Implementing user-defined types that require specialized marshalling beyond built-in DJL serialization
-
Advanced processing logic: Building complex pre/post-processing workflows that extend beyond standard translator capabilities
-
Non-standard data flows: Creating advanced examples with specialized data handling requirements
-
Fine-grained execution control: Needing direct control over the model inference pipeline and data transformation steps
Custom compute jobs provide the flexibility to define independent input types, output types, translators, and marshallers when the standard framework cannot accommodate your specific use case.
For complete examples of custom compute jobs, see the Advanced Examples documentation.
Download Model
Download a zeroshot classification model files from HuggingFace:
curl --ssl-no-revoke -O "https://huggingface.co/MoritzLaurer/deberta-v3-xsmall-zeroshot-v1.1-all-33/resolve/main/onnx/model_quantized.onnx"
curl --ssl-no-revoke -O "https://huggingface.co/MoritzLaurer/deberta-v3-xsmall-zeroshot-v1.1-all-33/resolve/main/tokenizer.json?download=true"
Custom Input/Output Types
For complex scenarios, create custom input and output types.
public class CustomInput implements Serializable {
private String text;
private String[] candidates;
private boolean multiLabel;
// constructors, getters, setters...
}
public class CustomOutput implements Serializable {
private String sequence;
private String[] labels;
private double[] scores;
// constructors, getters, setters...
}
Custom Translator
Create a custom translator and translator factory to pre-process the inputs before prediction and post process the output from the model for consumption.
Step 1: Create a Custom Translator
public class CustomTranslator implements NoBatchifyTranslator<CustomInput, CustomOutput> {
private CustomTranslator(HuggingFaceTokenizer tokenizer, boolean int32) {
this.tokenizer = tokenizer;
this.int32 = int32;
}
@Override
public void prepare(TranslatorContext ctx) throws IOException, ModelException {
Model model = ctx.getModel();
this.predictor = model.newPredictor(new NoopTranslator((Batchifier)null));
ctx.getPredictorManager().attachInternal(UUID.randomUUID().toString(),
new AutoCloseable[]{this.predictor});
}
@Override
public NDList processInput(TranslatorContext ctx, CustomInput input) {
//process input to create and return an NDList
}
@Override
public CustomOutput processOutput(TranslatorContext ctx, NDList list)
throws TranslateException {
CustomInput input = (CustomInput)ctx.getAttachment("input");
//process the output (NDList list) and return the custom output
return new CustomOutput(input.getText(), labels, scores);
}
}
Step 2: Create a Custom Translator Factory
public class CustomTranslatorFactory implements TranslatorFactory, Serializable {
@Override
public <I, O> Translator<I, O> newInstance(Class<I> input, Class<O> output,
Model model, Map<String, ?> arguments) {
// Custom translator implementation
return new CustomTranslator();
}
}
Step 3: Using Custom Translator
MlSimpleJobParameters<CustomInput> jobParams =
MlSimpleJobParameters.<CustomInput>builder()
.id(MODEL_ID)
.version(MODEL_VERSION)
.type(ModelType.ONNX)
.name("model_quantized")
.config(ModelConfig.builder().build())
.inputClass("org.apache.ignite.example.ml.multiinput.CustomInput")
.outputClass("org.apache.ignite.example.ml.multiinput.CustomOutput")
.translatorFactory(
"org.apache.ignite.example.ml.multiinput.CustomTranslatorFactory")
.input(input)
.build();
CustomOutput result = mlApi.predict(jobParams);
System.out.println("Classification Results:");
System.out.println("Text: " + prompt);
System.out.println("Predictions: " + result.getLabels()[0]);
System.out.println("Scores: " + result.getScores()[0]);
Custom Compute Jobs (Brief Introduction)
When you need full control over the prediction pipeline, including custom marshallers for serialization, you can extend the MlComputeJob class.
Basic Structure
public class CustomComputeJob<I extends MlSimpleJobParameters, O>
extends MlComputeJob<I, O> {
@Override
public CompletableFuture<O> predictAsync(JobExecutionContext context, I arg) {
if (arg == null) {
return CompletableFuture.completedFuture(null);
}
return context.ignite().ml().predictAsync(arg);
}
@Override
public Marshaller<I, byte[]> inputMarshaller() {
return new CustomInputMarshaller<>();
}
@Override
public Marshaller<O, byte[]> resultMarshaller() {
return new CustomOutputMarshaller<>();
}
}
When to Use Custom Compute Jobs
Custom compute jobs are required when:
-
You have custom input/output types that need specialized serialization
-
You need to implement custom marshallers for data transfer
-
You want to use different prediction methods (predict, batchPredict, predictFromSql, predictColocated) with custom types
Adding to Job Parameters
MlSimpleJobParameters<CustomInput> jobParams =
MlSimpleJobParameters.<CustomInput>builder()
// ... standard parameters ...
.customJobClass(CustomComputeJob.class)
.customInputMarshaller(new CustomInputMarshaller<>())
.customOutputMarshaller(new CustomOutputMarshaller<>())
.build();
For complete examples with full implementations of custom compute jobs, including marshallers for all prediction types, see the Advanced Examples documentation.
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.