GridGain Developers Hub

Operational Commands

COPY INTO

Description

Imports data from an external source into a table or exports data to a file from the table. The table the data is imported into must exist.

COPY FROM source
INTO target
FORMAT formatType
[ formatTypeOptions ]
[ PROPERTIES ( 'propName'='propValue' [,...]) ]

Parameters

  • source - the full path to the file to import the data data from (if target is a table). Then name of the table and the list of columns to export the data from (if target is a file). If the source is a table, you can further narrow down the selection by using a query.

  • target - the table and table columns when importing data to the table. The path to the file when exporting data.

  • formatType - the format of the file to work with:

    • CSV

    • PARQUET

    • ICEBERG

  • PROPERTIES - case-sensitive configuration properties:

    • s3.client-region - when using S3 storage, the region to store the data in.

    • s3.access-key-id - when using S3 storage, the AWS access key.

    • s3.secret-access-key - when using S3 storage, the AWS secret key.

    • For Iceberg format:

      • The catalog properties are supported.

      • The table-identifier property describes Apache Iceberg TableIdentifier names. The names can be dot-separated. For example, db_name.table_name or table.

      • The Apache Iceberg warehouse path can be defined explicitly as a 'warehouse'='path' property, or implicitly as a source/target COPY FROM source INTO target. If both ways are defined, the explicit property is used.

  • formatTypeOptions - parameters related to the selected file format:

    • DELIMITER - the field delimiter for CSV files; default delimiter is ,. Delimiter syntax is 'char'. Any alphanumeric character can be a delimiter.

    • PATTERN - the file pattern used when importing partitioned Parquet tables in the regular expression format. The regular expression must be enclosed in ' signs. For example, '.*' imports all files.

Examples

Import data from a CSV file into Table1:

/* Import data from CSV */
COPY FROM '/path/to/dir/data.csv'
INTO Table1 (name, age)
FORMAT CSV

Exports data from Table1 to a CSV file:

/* Export data to CSV */
COPY FROM (SELECT name, age FROM Table1)
INTO  '/path/to/dir/data.csv'
FORMAT CSV

Exports data from Table1 to a CSV file:

/* Export data to CSV */
COPY FROM Table1 (name, age)
INTO  '/path/to/dir/data.csv'
FORMAT CSV

Imports CSV file from AWS S3 into Table1:

/* Import CSV file from s3 */
COPY FROM 's3://mybucket/data.csv'
INTO Table1
FORMAT CSV
DELMITER '|'
PROPERTIES('s3.access-key-id' = 'keyid', 's3.secret-access-key' = 'secretkey')

A simple example of exporting data to Iceberg. For working with local file system you can use HadoopCatalog that does not need to connect to a Hive MetaStore.

COPY FROM Table1 (id,name,height)
INTO '/tmp/person.i/'
FORMAT ICEBERG
PROPERTIES ('table-identifier'='person', 'catalog-impl'='org.apache.iceberg.hadoop.HadoopCatalog')

Exports data into Iceberg on AWS S3:

COPY FROM person (id,name,height)
INTO 's3://iceberg-warehouse/glue-catalog'
FORMAT ICEBERG
PROPERTIES(
    'table-identifier'='iceberg_db_1.person',
    'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
    'catalog-impl'='org.apache.iceberg.aws.glue.GlueCatalog',
    's3.client-region'='eu-central-1',
    's3.access-key-id'='YOUR_KEY',
    's3.secret-access-key'='YOUR_SECRET'
    )

Imports data from partitioned Parquet database:

COPY FROM '/tmp/partitioned_table_dir'
INTO person (id, name, height)
FORMAT PARQUET
PATTERN '.*'

Where the Parquet table looks like this:

partitioned_table_dir/
├─ name=John/
│  ├─ 000000_0.parquet
├─ name=Doe/
│  ├─ 000000_0.parquet

SET STREAMING

Description

Streams data in bulk to an SQL table in your GridGain cluster. When streaming is enabled, the JDBC/ODBC driver packs your commands in batches and sends them to the server (GridGain cluster). On the server side, the batch is converted into a stream of cache update commands, which are distributed asynchronously between the server nodes. Performing this asynchronously increases peak throughput because at any given time all cluster nodes are busy with data loading.

SET STREAMING [OFF|ON];

Usage

To stream data into your GridGain cluster, prepare a file with the SET STREAMING ON command followed by INSERT commands for the data to be loaded. For example:

SET STREAMING ON;

INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (1,'Kabul','AFG','Kabol',1780000);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (2,'Qandahar','AFG','Qandahar',237500);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (3,'Herat','AFG','Herat',186800);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (4,'Mazar-e-Sharif','AFG','Balkh',127800);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (5,'Amsterdam','NLD','Noord-Holland',731200);
-- More INSERT commands --
CREATE TABLE City (
  ID INT(11),
  Name CHAR(35),
  CountryCode CHAR(3),
  District CHAR(20),
  Population INT(11),
  PRIMARY KEY (ID, CountryCode)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode, CACHE_NAME=City, KEY_TYPE=demo.model.CityKey, VALUE_TYPE=demo.model.City";

SET STREAMING ON;

INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (1,'Kabul','AFG','Kabol',1780000);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (2,'Qandahar','AFG','Qandahar',237500);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (3,'Herat','AFG','Herat',186800);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (4,'Mazar-e-Sharif','AFG','Balkh',127800);
INSERT INTO City(ID, Name, CountryCode, District, Population) VALUES (5,'Amsterdam','NLD','Noord-Holland',731200);
-- More INSERT commands --

Known Limitations

While streaming mode allows you to load data much faster than other data loading techniques, it has some limitations:

  • Only INSERT commands are allowed; any attempt to execute SELECT or any other DML or DDL command will cause an exception.

  • Due to streaming mode’s asynchronous nature, you cannot know update counts for every statement executed; all JDBC/ODBC commands returning update counts will return 0.

Example

Use the world.sql file that is shipped with the latest GridGain CE distribution. It can be found in the {gridgain_dir}/examples/sql/ directory. You can use the run command from SQLLine, as shown below:

!run /apache_ignite_version/examples/sql/world.sql

After executing the above command and closing the JDBC connection, all data will be loaded into the cluster and ready to be queried.

set streaming

KILL QUERY

Description

Cancel a running query. When a query is canceled with the KILL command, all parts of the query running on all other nodes are canceled too.

KILL QUERY {ASYNC} '<query_id>'

Parameters

  • ASYNC - an optional parameter, which returns control immediately without waiting for the cancellation to finish

  • query_id - retrieved via the SQL QUERIES view

Description

When a query is canceled via the KILL command, all parts of the query running on all other nodes are canceled too.

COPY

Copy data from a CSV file into a SQL table.

COPY FROM '/path/to/local/file.csv'
INTO tableName (columnName, columnName, ...) FORMAT CSV [CHARSET '<charset-name>']

Parameters

  • '/path/to/local/file.csv' - the actual path to the CSV file

  • tableName - the name of the table to copy the data to

  • columnName - the name of a column corresponding to a column in the CSV file

Example

COPY FROM '/path/to/local/file.csv' INTO city (
  ID, Name, CountryCode, District, Population) FORMAT CSV

In the above command, substitute /path/to/local/file.csv with the actual path to your CSV file. For instance, you can use city.csv that is shipped with the latest GridGain CE distribution. You can find it in your {gridgain_dir}/examples/src/main/resources/sql/ directory.