Secondary Storage for Analytics
Overview
GridGain secondary storage allows you to create hybrid transactional/analytical processing (HTAP) systems by configuring tables to store data in both primary and secondary storage engines. This approach enables you to optimize your database for both transactional and analytical workloads simultaneously. The primary storage engine (usually aipersist
) handles write operations and transactional queries, while the secondary storage (columnar
) is optimized for analytical queries.
This guide will show you how to configure and use secondary storage for analytical workloads.
Prerequisites
Before you begin, ensure you have:
-
A license with secondary storage add-on;
-
Basic understanding of distribution zones;
-
Basic understanding of storage profiles.
Configuring Secondary Storage
Step 1: Create a Columnar Storage Profile
First, create a storage profile that uses the columnar
storage engine:
node config update "ignite.storage.profiles:{columnar{engine:columnar}}"
After updating the configuration, restart the node for the changes to take effect.
Step 2: Create a Distribution Zone with Multiple Storage Profiles
After restarting the node, enter the sql editing mode of the CLI tool with the sql
command.
Then, create a distribution zone that supports both the default storage profile (for primary storage) and the columnar storage profile (for secondary storage):
CREATE ZONE HTAP WITH STORAGE_PROFILES='default, columnar';
This distribution zone will now be able to store data using either the default
or columnar
storage profiles.
Step 3: Create Tables with Primary and Secondary Storage
Now you can create tables that use both storage profiles:
CREATE TABLE Person (
id INT PRIMARY KEY,
city_id INT,
name VARCHAR,
age INT,
company VARCHAR
)
PRIMARY ZONE HTAP PRIMARY STORAGE PROFILE 'default'
SECONDARY ZONE HTAP SECONDARY STORAGE PROFILE 'columnar';
In this example:
-
The
PRIMARY ZONE
andPRIMARY STORAGE PROFILE
clauses define where and how the primary data will be stored -
The
SECONDARY ZONE
andSECONDARY STORAGE PROFILE
clauses define where and how the secondary data will be stored
Using Secondary Storage
Writing Data
Data is always written to the primary storage first, then automatically propagated to the secondary storage. You cannot write directly to the secondary storage.
Example of inserting data:
-- Load sample data
INSERT INTO Person (id, city_id, name, age, company) VALUES
(1, 101, 'Alice Smith', 29, 'Acme Corp'),
(2, 102, 'Bob Johnson', 35, 'TechNova'),
(3, 103, 'Charlie Lee', 41, 'InnovaWorks'),
(4, 104, 'Diana Chen', 33, 'Quantum Solutions'),
(5, 105, 'Ethan Patel', 27, 'NextGen Inc'),
(6, 106, 'Fiona Garcia', 38, 'CloudCore'),
(7, 107, 'George Adams', 30, 'EcoTech'),
(8, 108, 'Hannah Kim', 26, 'NeoWare'),
(9, 109, 'Ivan Kuznetsov', 44, 'AltLogic'),
(10, 110, 'Jasmine Nguyen', 31, 'BrightEdge');
Reading from Primary Storage
By default, all queries read from the primary storage:
SELECT name, age FROM Person WHERE company = 'TechNova';
Reading from Secondary Storage
To read from the secondary storage, use the use_secondary_storage
query hint:
SELECT /*+ use_secondary_storage */ name, age
FROM Person
WHERE company = 'TechNova';
Understanding Query Plans
You can use the EXPLAIN PLAN
statement to see how queries will be executed against either the primary or secondary storage:
-- View plan for primary storage query
EXPLAIN PLAN FOR
SELECT name, age FROM Person WHERE company = 'TechNova';
-- View plan for secondary storage query
EXPLAIN PLAN FOR
SELECT /*+ use_secondary_storage */ name, age
FROM Person
WHERE company = 'TechNova';
You can see the useSecondaryStorage: true
explanation when using secondary storage, for example:
╔═════════════════════════════════════════════╗ ║ PLAN ║ ╠═════════════════════════════════════════════╣ ║ Exchange ║ ║ distribution: single ║ ║ est. row count: 1 ║ ║ TableScan ║ ║ table: [PUBLIC, PERSON] ║ ║ filters: =(COMPANY, _UTF-8'TechNova') ║ ║ fields: [$f0, $f1] ║ ║ projects: [NAME, AGE] ║ ║ useSecondaryStorage: true ║ ║ est. row count: 1 ║ ╚═════════════════════════════════════════════╝
Mixed Workloads
You can mix data from primary and secondary storage in a single query by applying the use_secondary_storage
hint to specific tables.
First, let’s create an extra table that will not be using secondary storage and add some data to it:
-- Create sample table CREATE TABLE Orders ( CustomerId int primary key, Company varchar ); -- Load sample data INSERT INTO Orders (CustomerId, Company) VALUES (1, 'Acme Corp'), (2, 'TechNova');
Since no distribution zone or storage profile is specified, this table will be created in the default distribution zone, and will not be replicated to secondary storage.
SELECT Person.name, Orders.CustomerId
FROM Person /*+ use_secondary_storage */
JOIN Orders
ON Person.id = Orders.CustomerId
WHERE Person.company = 'TechNova';
In this example, data from the Person
table is read from secondary storage, while data from the Orders
table is read from primary storage.
Considerations
Data Consistency
Be aware that secondary storage is updated asynchronously after primary storage. This means that secondary storage may be slightly behind primary storage (typically under a second). For queries requiring absolute consistency, use primary storage.
Workload Separation
-
Use primary storage for transactional workloads (OLTP) that require frequent writes and point queries;
-
Use secondary storage for analytical workloads (OLAP) that involve scanning large amounts of data.
Query Optimization
-
Analytical queries that scan large portions of tables will typically perform better on columnar secondary storage;
-
Point lookups by primary key often perform better on primary storage;
-
Use
EXPLAIN PLAN
to understand how your queries will be executed.
Resource Planning
Consider allocating different nodes or resource pools for transactional and analytical workloads to prevent analytical queries from affecting the performance of transactional operations.
© 2025 GridGain Systems, Inc. All Rights Reserved. Privacy Policy | Legal Notices. GridGain® is a registered trademark of GridGain Systems, Inc.
Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are either registered trademarks or trademarks of The Apache Software Foundation.