Distribution Zones
What is a Distribution Zone?
Distribution zones in GridGain are entities that combine sets of tables and define:
-
How these tables are distributed across the cluster, how many copies of data are made, how the data is partitioned, how partitions are assigned to nodes.
-
On which cluster nodes these tables will be stored.
-
How the cluster reacts to nodes entering or leaving the cluster, e.g. whether the tables will automatically start using a new node when the cluster is scaled up.
Distribution zones are not equivalent to the concept of availability zone commonly used in cloud computing.
Availability zone is a set of infrastructure resources with independent hardware, networking, power, and is often physically separated from other availability zones.
GridGain cluster often spans across multiple availability zones, and distribution zones also typically span across multiple availability zones. That way, tables can continue to be available even if one of the availability zones goes down.
Default Zone
GridGain 9 create a default
distribution zone on startup. This distribution zone stores data from tables when they are not configured to use a different zone, or when a different distribution zone is not available. This distribution zone has 25 partitions, 1 partition replica and does not adjust itself to new nodes entering or exiting the cluster. For production purposes, we recommend creating a new distribution zone adjusted for your purposes.
Creating and Using Zones
Distribution zones in GridGain 9 are created by using the SQL CREATE ZONE
command. When creating a zone, you must specify the Storage Profile to use. The storage profile determines what storage engine will be used, and storage properties.
The example below creates a distribution zone with the default storage profile:
CREATE ZONE IF NOT EXISTS exampleZone WITH STORAGE_PROFILES='default'
Configuring Data Replication
You can control the number of partitions (how many pieces the data is split into) and replicas (how many copies of data are stored) by using the PARTITIONS
and REPLICAS
options.
If not specified, the distribution zone creates 25 partitions, and does not create copies of data.
In the example below, the tables will be split into 50 partitions, and each partition will have 3 copies of itself stored on the cluster:
CREATE ZONE IF NOT EXISTS exampleZone WITH STORAGE_PROFILES='default', PARTITIONS=20, REPLICAS=3
Node Filtering
Distribution zones can get node attributes, that can be specified in node configuration, and dynamically distribute data only to nodes that have the specified attributes. This can be used, for example, to only process data from the application on nodes with SSD drives. If no node matches the filter, the data will be stored on all nodes instead. Distribution zone filter uses JSONPath rules.
The example below creates a distribution zone that only stores data on nodes that have the SSD attribute:
CREATE ZONE IF NOT EXISTS exampleZone WITH STORAGE_PROFILES='default',DATA_NODES_FILTER='SSD'
Cluster Scaling
The number of active nodes in the cluster can dynamically change during its operation, as more nodes are added, or nodes are taken down for maintenance. GridGain will automatically handle data redistribution, but often it is a good idea to provide some buffer time for other tasks to finish first. To do this, you can specify DATA_NODES_AUTO_ADJUST_SCALE_UP
and DATA_NODES_AUTO_ADJUST_SCALE_DOWN
parameters to specify the delay in seconds between nodes entering or leaving the cluster and the start of data zone adjustment.
By default, distribution zones do not adjust to cluster changes.
CREATE ZONE IF NOT EXISTS exampleZone WITH STORAGE_PROFILES='default',DATA_NODES_AUTO_ADJUST_SCALE_UP=100,DATA_NODES_AUTO_ADJUST_SCALE_DOWN=20.
Checking Distribution Zone Properties
Distribution zone properties can be viewed through the system.zones
system view. You can use the following SQL command to get it:
SELECT * from system.zones
The command lists information about all distribution zones on the cluster.
Adjusting Distribution Zones
To change distribution zone parameters, use the ALTER ZONE
command. You can use the same parameters as when creating the zone. For example:
ALTER ZONE IF EXISTS exampleZone SET REPLICAS=5
Defining Storage Profiles
When creating a distribution zone, you can define a set of storage profiles that can be used by tables in this zone. You cannot alter storage profiles after the distribution zone was created. To create a Distribution Zone that will use one or multiple Storage Profiles, use the following SQL command:
CREATE ZONE exampleZone WITH PARTITIONS=2, REPLICAS=3, STORAGE_PROFILES='profile1, profile3'
In this case, the table created in this distribution zones can only use profile1
or profile3
.
Replicated Zones
To create a replicated zone (a distribution zone where all data is stored on all nodes), specify the number of replicas to be equal to the number of nodes in your cluster. GridGain will automatically distribute replicas to different nodes.
CREATE ZONE IF NOT EXISTS exampleZone WITH STORAGE_PROFILES='default', REPLICAS=3
Example Zone Usage
In this example, we create a distribution zone and then create 2 tables that will be colocated on the same zone.
CREATE ZONE IF NOT EXISTS EXAMPLEZONE WITH STORAGE_PROFILES='default', PARTITIONS=20, REPLICAS=3
CREATE TABLE IF NOT EXISTS Person (
id int primary key,
city_id int,
name varchar,
age int,
company varchar
) WITH PRIMARY_ZONE='EXAMPLEZONE'
CREATE TABLE IF NOT EXISTS Account (
id int primary key,
name varchar,
amount int
) WITH PRIMARY_ZONE='EXAMPLEZONE'
© 2024 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.