public class DatasetFactory extends Object
Dataset construction is based on three major concepts: a partition upstream, context and
data. A partition upstream is a data source, which assumed to be available all the time regardless
node failures and rebalancing events. A partition context is a part of a partition maintained during the
whole computation process and stored in a reliable storage so that a context is staying available and
consistent regardless node failures and rebalancing events as well as an upstream. A partition data
is a part of partition maintained during a computation process in unreliable local storage such as heap, off-heap or
GPU memory on the node where current computation is performed, so that partition data can be lost as result
of node failure or rebalancing, but it can be restored from an upstream and a partition context.
A partition context and data are built on top of an upstream by using specified
builders: PartitionContextBuilder and PartitionDataBuilder correspondingly. To build a generic
dataset the following approach is used:
Dataset<C, D> dataset = DatasetFactory.create( ignite, cache, partitionContextBuilder, partitionDataBuilder );
As well as the generic building method create this factory provides methods that allow to create a
specific dataset types such as method createSimpleDataset to create SimpleDataset and method createSimpleLabeledDataset to create SimpleLabeledDataset.
Dataset,
PartitionContextBuilder,
PartitionDataBuilder| Constructor and Description |
|---|
DatasetFactory() |
| Modifier and Type | Method and Description |
|---|---|
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder,
LearningEnvironment environment)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder. |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(DatasetBuilder<K,V> datasetBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder. |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder,
LearningEnvironment environment)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder. |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(Ignite ignite,
IgniteCache<K,V> upstreamCache,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder. |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(Map<K,V> upstreamMap,
LearningEnvironmentBuilder envBuilder,
int partitions,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder,
LearningEnvironment environment)
Creates a new instance of local dataset using the specified
partCtxBuilder and partDataBuilder. |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified partCtxBuilder and featureExtractor. |
static <K,V,CO extends Serializable> |
createSimpleDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified featureExtractor. |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified partCtxBuilder and featureExtractor. |
static <K,V,CO extends Serializable> |
createSimpleDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified featureExtractor. |
static <K,V,CO extends Serializable> |
createSimpleDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified featureExtractor. |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleDataset(Map<K,V> upstreamMap,
int partitions,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of local
SimpleDataset using the specified partCtxBuilder and featureExtractor. |
static <K,V,CO extends Serializable> |
createSimpleDataset(Map<K,V> upstreamMap,
int partitions,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of local
SimpleDataset using the specified featureExtractor. |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified partCtxBuilder,
featureExtractor and lbExtractor. |
static <K,V,CO extends Serializable> |
createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified featureExtractor
and lbExtractor. |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleLabeledDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified partCtxBuilder,
featureExtractor and lbExtractor. |
static <K,V,CO extends Serializable> |
createSimpleLabeledDataset(Ignite ignite,
LearningEnvironmentBuilder envBuilder,
IgniteCache<K,V> upstreamCache,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified featureExtractor
and lbExtractor. |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleLabeledDataset(Map<K,V> upstreamMap,
int partitions,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of local
SimpleLabeledDataset using the specified partCtxBuilder, featureExtractor and lbExtractor. |
static <K,V,CO extends Serializable> |
createSimpleLabeledDataset(Map<K,V> upstreamMap,
LearningEnvironmentBuilder envBuilder,
int partitions,
Preprocessor<K,V> vectorizer)
Creates a new instance of local
SimpleLabeledDataset using the specified featureExtractor and
lbExtractor. |
public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder, LearningEnvironment environment)
partCtxBuilder and partDataBuilder. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context and data.K - Type of a key in upstream data.V - ype of a value in upstream data.C - Type of a partition context.D - Type of a partition data.envBuilder - Learning environment builder.datasetBuilder - Dataset builder.partCtxBuilder - Partition context builder.partDataBuilder - Partition data builder.environment - Local learning environment.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(DatasetBuilder<K,V> datasetBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder)
partCtxBuilder and partDataBuilder. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context and data.K - Type of a key in upstream data.V - ype of a value in upstream data.C - Type of a partition context.D - Type of a partition data.datasetBuilder - Dataset builder.partCtxBuilder - Partition context builder.partDataBuilder - Partition data builder.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder, LearningEnvironment environment)
partCtxBuilder and partDataBuilder. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context and data.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.D - Type of a partition data.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.partDataBuilder - Partition data builder.environment - Local learning environment.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(Ignite ignite, IgniteCache<K,V> upstreamCache, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder)
partCtxBuilder and partDataBuilder. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context and data.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.D - Type of a partition data.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.partCtxBuilder - Partition context builder.partDataBuilder - Partition data builder.public static <K,V,C extends Serializable,CO extends Serializable> SimpleDataset<C> createSimpleDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified partCtxBuilder and featureExtractor. This methods determines partition data to be SimpleDatasetData, but allows to
use any desired type of partition context.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.datasetBuilder - Dataset builder.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,C extends Serializable,CO extends Serializable> SimpleDataset<C> createSimpleDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified partCtxBuilder and featureExtractor. This methods determines partition data to be SimpleDatasetData, but allows to
use any desired type of partition context.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,C extends Serializable,CO extends Serializable> SimpleLabeledDataset<C> createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset using the specified partCtxBuilder,
featureExtractor and lbExtractor. This method determines partition data to be SimpleLabeledDatasetData, but allows to use any desired type of partition context.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.datasetBuilder - Dataset builder.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.vectorizer - Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData.public static <K,V,C extends Serializable,CO extends Serializable> SimpleLabeledDataset<C> createSimpleLabeledDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset using the specified partCtxBuilder,
featureExtractor and lbExtractor. This method determines partition data to be SimpleLabeledDatasetData, but allows to use any desired type of partition context.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.vectorizer - Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified featureExtractor. This
methods determines partition context to be EmptyContext and partition data to be SimpleDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.datasetBuilder - Dataset builder.envBuilder - Learning environment builder.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified featureExtractor. This
methods determines partition context to be EmptyContext and partition data to be SimpleDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.envBuilder - Learning environment builder.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified featureExtractor. This
methods determines partition context to be EmptyContext and partition data to be SimpleDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,CO extends Serializable> SimpleLabeledDataset<EmptyContext> createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset using the specified featureExtractor
and lbExtractor. This methods determines partition context to be EmptyContext and
partition data to be SimpleLabeledDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.datasetBuilder - Dataset builder.envBuilder - Learning environment builder.vectorizer - Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData.public static <K,V,CO extends Serializable> SimpleLabeledDataset<EmptyContext> createSimpleLabeledDataset(Ignite ignite, LearningEnvironmentBuilder envBuilder, IgniteCache<K,V> upstreamCache, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset using the specified featureExtractor
and lbExtractor. This methods determines partition context to be EmptyContext and
partition data to be SimpleLabeledDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.ignite - Ignite instance.upstreamCache - Ignite Cache with upstream data.envBuilder - Learning environment builder.vectorizer - Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(Map<K,V> upstreamMap, LearningEnvironmentBuilder envBuilder, int partitions, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder, LearningEnvironment environment)
partCtxBuilder and partDataBuilder.
This is the generic methods that allows to create any Ignite Cache based datasets with any desired partition
context and data.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.D - Type of a partition data.upstreamMap - Map with upstream data.partitions - Number of partitions upstream Map will be divided on.partCtxBuilder - Partition context builder.envBuilder - Learning environment builder.partDataBuilder - Partition data builder.environment - Local learning environment.public static <K,V,C extends Serializable,CO extends Serializable> SimpleDataset<C> createSimpleDataset(Map<K,V> upstreamMap, int partitions, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified partCtxBuilder and featureExtractor. This methods determines partition data to be SimpleDatasetData, but allows to
use any desired type of partition context.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.upstreamMap - Map with upstream data.partitions - Number of partitions upstream Map will be divided on.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,C extends Serializable,CO extends Serializable> SimpleLabeledDataset<C> createSimpleLabeledDataset(Map<K,V> upstreamMap, int partitions, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset using the specified partCtxBuilder, featureExtractor and lbExtractor. This method determines partition data to be SimpleLabeledDatasetData, but allows to use any desired type of partition context.K - Type of a key in upstream data.V - Type of a value in upstream data.C - Type of a partition context.upstreamMap - Map with upstream data.partitions - Number of partitions upstream Map will be divided on.envBuilder - Learning environment builder.partCtxBuilder - Partition context builder.vectorizer - Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(Map<K,V> upstreamMap, int partitions, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset using the specified featureExtractor. This methods
determines partition context to be EmptyContext and partition data to be SimpleDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.upstreamMap - Map with upstream data.partitions - Number of partitions upstream Map will be divided on.envBuilder - Learning environment builder.featureExtractor - Feature extractor used to extract features and build SimpleDatasetData.public static <K,V,CO extends Serializable> SimpleLabeledDataset<EmptyContext> createSimpleLabeledDataset(Map<K,V> upstreamMap, LearningEnvironmentBuilder envBuilder, int partitions, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset using the specified featureExtractor and
lbExtractor. This methods determines partition context to be EmptyContext and partition
data to be SimpleLabeledDatasetData.K - Type of a key in upstream data.V - Type of a value in upstream data.upstreamMap - Map with upstream data.partitions - Number of partitions upstream Map will be divided on.envBuilder - Learning environment builder.vectorizer - Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData.
GridGain In-Memory Computing Platform : ver. 8.9.26 Release Date : October 16 2025