GridGain Developers Hub
GitHub logo GridGain iso GridGain.com
GridGain Software Documentation

Python Thin Client

Prerequisites

Python 3.4 or above.

Installation

The python thin client module comes with the GridGain distribution package under the {GRIDGAIN_HOME}/platforms/python/pygridgain directory. The module is called "pygridgain". To install it into your local machine, use the following command:

$ cd {GRIDGAIN_HOME}/platforms/python
$ pip3 install -e .

The distribution package contains runnable examples that demonstrate basic usage scenarios of the python thin client. The examples are located in the {GRIDGAIN_HOME}/platforms/python/examples directory.

Connecting to the Cluster

The following code snippet shows how to connect to a cluster from the Python thin client:

from pygridgain import Client

## Open a connection
client = Client()
client.connect('127.0.0.1', 10800)

Client Failover

You can configure the client to automatically fail over to another node if the connection to the current node fails or times out.

When connection fails, the Client object propagates an original exception (OSError or SocketError), but keeps its constructor’s parameters intact and tries to reconnect transparently. When the client fails to reconnect, it throws a special ReconnectError exception.

In the following example, the client is given the addresses of three cluster nodes.

from pygridgain import Client
from pygridgain.datatypes.cache_config import CacheMode
from pygridgain.datatypes.prop_codes import *
from pygridgain.exceptions import SocketError

nodes = [
    ('127.0.0.1', 10800),
    ('217.29.2.1', 10800),
    ('200.10.33.1', 10800),
]

client = Client(timeout=40.0)
client.connect(nodes)
print('Connected to {}'.format(client))

my_cache = client.get_or_create_cache({
    PROP_NAME: 'my_cache',
    PROP_CACHE_MODE: CacheMode.REPLICATED,
})
my_cache.put('test_key', 0)

# abstract main loop
while True:
    try:
        # do the work
        test_value = my_cache.get('test_key')
        my_cache.put('test_key', test_value + 1)
    except (OSError, SocketError) as e:
        # recover from error (repeat last command, check data
        # consistency or just continue − depends on the task)
        print('Error: {}'.format(e))
        print('Last value: {}'.format(my_cache.get('test_key')))
        print('Reconnected to {}'.format(client))

Creating a Cache

You can get an instance of a cache using one of the following methods:

  • get_cache(settings) — creates a local Cache object with the given name or set of parameters. The cache must exist in the cluster; otherwise, an exception will be thrown when you attempt to perform operations on that cache.

  • create_cache(settings) — creates a cache with the given name or set of parameters.

  • get_or_create_cache(settings) — returns an existing cache or creates it if the cache does not exist.

Each method accepts a cache name or a dictionary of properties that represents a cache configuration.

from pygridgain import Client

# Open a connection
client = Client()
client.connect('127.0.0.1', 10800)

# Create a cache
my_cache = client.create_cache('myCache')

Here is an example of creating a cache with a set of properties:

from collections import OrderedDict

from pygridgain import Client, GenericObjectMeta
from pygridgain.datatypes import *
from pygridgain.datatypes.prop_codes import *

# # Open a connection
client = Client()
client.connect('127.0.0.1', 10800)

cache_config = {
    PROP_NAME: 'my_cache',
    PROP_BACKUPS_NUMBER: 2,
    PROP_CACHE_KEY_CONFIGURATION: [
        {
            'type_name': 'PersonKey',
            'affinity_key_field_name': 'companyId'
        }
    ]
}

my_cache = client.create_cache(cache_config)


class PersonKey(metaclass=GenericObjectMeta, type_name='PersonKey', schema=OrderedDict([
    ('personId', IntObject),
    ('companyId', IntObject),
])):
    pass


personKey = PersonKey(personId=1, companyId=1)
my_cache.put(personKey, 'test')

print(my_cache.get(personKey))

See the next section for the list of supported cache properties.

Cache Configuration

The list of property keys that you can specify are provided in the prop_codes module.

Property name Type Description

PROP_NAME

str

Cache name. This is the only required property.

PROP_CACHE_MODE

int

  • REPLICATED=1,

  • PARTITIONED=2

PROP_CACHE_ATOMICITY_MODE

int

  • TRANSACTIONAL=0,

  • ATOMIC=1

PROP_BACKUPS_NUMBER

int

Number of backup partitions.

PROP_WRITE_SYNCHRONIZATION_MODE

int

Write synchronization mode:

  • FULL_SYNC=0,

  • FULL_ASYNC=1,

  • PRIMARY_SYNC=2

PROP_COPY_ON_READ

bool

The copy on read flag. The default value is true.

PROP_READ_FROM_BACKUP

bool

The flag indicating whether entries will be read from the local backup partitions, when available, or always requested from the primary partitions. The defaul value is true.

PROP_DATA_REGION_NAME

str

Data region name.

PROP_IS_ONHEAP_CACHE_ENABLED

bool

Enable on-heap caching for the cache.

PROP_QUERY_ENTITIES

list

A list of query entities. See the Query Entities section below for details.)

PROP_QUERY_PARALLELISM

int

Query parallelism

PROP_QUERY_DETAIL_METRIC_SIZE

int

Query detail metric size

PROP_SQL_SCHEMA

str

SQL Schema

PROP_SQL_INDEX_INLINE_MAX_SIZE

int

SQL index inline maximum size

PROP_SQL_ESCAPE_ALL

bool

Turns on SQL escapes

PROP_MAX_QUERY_ITERATORS

int

Maximum number of query iterators

PROP_REBALANCE_MODE

int

Rebalancing mode:

  • SYNC=0,

  • ASYNC=1,

  • NONE=2

PROP_REBALANCE_DELAY

int

Rebalancing delay (ms)

PROP_REBALANCE_TIMEOUT

int

Rebalancing timeout (ms)

PROP_REBALANCE_BATCH_SIZE

int

Rebalancing batch size

PROP_REBALANCE_BATCHES_PREFETCH_COUNT

int

Rebalancing prefetch count

PROP_REBALANCE_ORDER

int

Rebalancing order

PROP_REBALANCE_THROTTLE

int

Rebalancing throttle interval (ms)

PROP_GROUP_NAME

str

Group name

PROP_CACHE_KEY_CONFIGURATION

list

Cache Key Configuration (see Cache key)

PROP_DEFAULT_LOCK_TIMEOUT

int

Default lock timeout (ms)

PROP_MAX_CONCURRENT_ASYNC_OPERATIONS

int

Maximum number of concurrent asynchronous operations

PROP_PARTITION_LOSS_POLICY

int

  • READ_ONLY_SAFE=0,

  • READ_ONLY_ALL=1,

  • READ_WRITE_SAFE=2,

  • READ_WRITE_ALL=3,

  • IGNORE=4

PROP_EAGER_TTL

bool

Eager TTL

PROP_STATISTICS_ENABLED

bool

The flag that enables statistics.

Query Entities

Query entities are objects that describe queryable fields, i.e. the fields of the cache objects that can be queried using SQL queries.

  • table_name: SQL table name.

  • key_field_name: name of the key field.

  • key_type_name: name of the key type (Java type or complex object).

  • value_field_name: name of the value field.

  • value_type_name: name of the value type.

  • field_name_aliases: a list of 0 or more dicts of aliases (see Field Name Aliases).

  • query_fields: a list of 0 or more query field names (see Query Fields).

  • query_indexes: a list of 0 or more query indexes (see Query Indexes).

Field Name Aliases

Field name aliases are used to give a convenient name for the full property name (object.name → objectName).

  • field_name: field name.

  • alias: alias (str).

Query Fields

Query fields define the fields that are queriable.

  • name: field name.

  • type_name: name of Java type or complex object.

  • is_key_field: (optional) boolean value, False by default.

  • is_notnull_constraint_field: boolean value.

  • default_value: (optional) anything that can be converted to type_name type. None (Null) by default.

  • precision: (optional) decimal precision: total number of digits in decimal value. Defaults to -1 (use cluster default). Ignored for non-decimal SQL types (other than java.math.BigDecimal).

  • scale: (optional) decimal precision: number of digits after the decimal point. Defaults to -1 (use cluster default). Ignored for non-decimal SQL types.

Query Indexes

Query indexes define the fields that will be indexed.

  • index_name: index name.

  • index_type: index type code as an integer value in unsigned byte range.

  • inline_size: integer value.

  • fields: a list of 0 or more indexed fields (see Fields).

Fields
  • name: field name.

  • is_descending: (optional) boolean value; False by default.

Cache key
  • type_name: name of the complex object.

  • affinity_key_field_name: name of the affinity key field.

Using Key-Value API

The pygridgain.cache.Cache class provides methods for working with cache entries by using key-value operations, such as put, get, put all, get all, replace and others. The following example shows how to do that:

from pygridgain import Client

client = Client()
client.connect('127.0.0.1', 10800)

#Create cache
my_cache = client.create_cache('my cache')

#Put value in cache
my_cache.put('my key', 42)

#Get value from cache
result = my_cache.get('my key')
print(result)  # 42

result = my_cache.get('non-existent key')
print(result)  # None

#Get multiple values from cache
result = my_cache.get_all([
    'my key',
    'non-existent key',
    'other-key',
])
print(result)  # {'my key': 42}

Using type hints

The pygridgain methods that deal with a single value or key have an additional optional parameter, either value_hint or key_hint, that accepts a parser/constructor class. Nearly any structure element (inside dict or list) can be replaced with a 2-tuple (the element, type hint).

from pygridgain import Client
from pygridgain.datatypes import CharObject, ShortObject

client = Client()
client.connect('127.0.0.1', 10800)

my_cache = client.get_or_create_cache('my cache')

my_cache.put('my key', 42)
# value ‘42’ takes 9 bytes of memory as a LongObject

my_cache.put('my key', 42, value_hint=ShortObject)
# value ‘42’ takes only 3 bytes as a ShortObject

my_cache.put('a', 1)
# ‘a’ is a key of type String

my_cache.put('a', 2, key_hint=CharObject)
# another key ‘a’ of type CharObject was created

value = my_cache.get('a')
print(value)  # 1

value = my_cache.get('a', key_hint=CharObject)
print(value)  # 2

# now let us delete both keys at once
my_cache.remove_keys([
    'a',  # a default type key
    ('a', CharObject),  # a key of type CharObject
])

Asynchronous Execution

Scan Queries

The scan() method of the cache object can be used to get all objects from the cache. The it returns a generator that yields (key,value) tuples. You can iterate through the generated pairs as follows:

from pygridgain import Client

client = Client()
client.connect('127.0.0.1', 10800)

my_cache = client.create_cache('myCache')

my_cache.put_all({'key_{}'.format(v): v for v in range(20)})
# {
#     'key_0': 0,
#     'key_1': 1,
#     'key_2': 2,
#     ... 20 elements in total...
#     'key_18': 18,
#     'key_19': 19
# }

result = my_cache.scan()

for k, v in result:
    print(k, v)
# 'key_17' 17
# 'key_10' 10
# 'key_6' 6,
# ... 20 elements in total...
# 'key_16' 16
# 'key_12' 12

Alternatively, you can convert the generator to a dictionary in one go:

result = my_cache.scan()
print(dict(result))
# {
#     'key_17': 17,
#     'key_10': 10,
#     'key_6': 6,
#     ... 20 elements in total...
#     'key_16': 16,
#     'key_12': 12
# }

Executing SQL Statements

The python thin client supports all SQL commands that are supported by GridGain. The commands are executed via the sql() method of the cache object. The sql() method returns a generator that yields the resulting rows.

Refer to the SQL Reference section for the list of supported commands.

from pygridgain import Client

client = Client()
client.connect('127.0.0.1', 10800)

CITY_CREATE_TABLE_QUERY = '''CREATE TABLE City (
    ID INT(11),
    Name CHAR(35),
    CountryCode CHAR(3),
    District CHAR(20),
    Population INT(11),
    PRIMARY KEY (ID, CountryCode)
) WITH "affinityKey=CountryCode"'''

client.sql(CITY_CREATE_TABLE_QUERY)

CITY_CREATE_INDEX = '''CREATE INDEX idx_country_code ON city (CountryCode)'''

client.sql(CITY_CREATE_INDEX)

CITY_INSERT_QUERY = '''INSERT INTO City(
    ID, Name, CountryCode, District, Population
) VALUES (?, ?, ?, ?, ?)'''

CITY_DATA = [
    [3793, 'New York', 'USA', 'New York', 8008278],
    [3794, 'Los Angeles', 'USA', 'California', 3694820],
    [3795, 'Chicago', 'USA', 'Illinois', 2896016],
    [3796, 'Houston', 'USA', 'Texas', 1953631],
    [3797, 'Philadelphia', 'USA', 'Pennsylvania', 1517550],
    [3798, 'Phoenix', 'USA', 'Arizona', 1321045],
    [3799, 'San Diego', 'USA', 'California', 1223400],
    [3800, 'Dallas', 'USA', 'Texas', 1188580],
]

for row in CITY_DATA:
    client.sql(CITY_INSERT_QUERY, query_args=row)

CITY_SELECT_QUERY = "SELECT * FROM City"

cities = client.sql(CITY_SELECT_QUERY)
for city in cities:
    print(*city)

The sql() method supports a number of parameters that

Parameter Description

query_str

page_size

query_args

schema

statement_type

distributed_joins

local

replicated_only

enforce_join_order

collocated

lazy

include_field_names

max_rows

timeout

The sql() method returns a generator that yields the resulting rows.

Note that if you set the include_field_names argument to True, the sql() method will generate a list of column names in the first yield. You can access field names with Python built-in next function.

field_names = client.sql(CITY_SELECT_QUERY, include_field_names=True).__next__()
print(field_names)

Security

SSL/TLS

To use encrypted communication between the thin client and the cluster, you have to enable it both in the cluster configuration and the client configuration. Refer to the Enabling SSL/TLS for Thin Clients section for the instruction on the cluster configuration.

Below is an example configuration for enabling SSL in the thin client:

from pygridgain import Client
import ssl

client = Client(
                use_ssl=True,
                ssl_cert_reqs=ssl.CERT_REQUIRED,
                ssl_keyfile='/path/to/key/file',
                ssl_certfile='/path/to/client/cert',
                ssl_ca_certfile='/path/to/trusted/cert/or/chain',
)

client.connect('localhost', 10800)

Supported parameters:

Parameter Description

use_ssl

Set to True to enable SSL/TLS on the client.

ssl_keyfile

Path to the file containing the SSL key.

ssl_certfile

Path to the file containing the SSL certificate.

ssl_ca_certfile

The path to the file with trusted certificates.

ssl_cert_reqs

  • ssl.CERT_NONE − remote certificate is ignored (default),

  • ssl.CERT_OPTIONAL − remote certificate will be validated, if provided,

  • ssl.CERT_REQUIRED − valid remote certificate is required,

ssl_version

ssl_ciphers

Authentication

Configure authentication on the cluster side and provide a valid user name and password in the client configuration.

from pygridgain import Client
import ssl


client = Client(
                ssl_cert_reqs=ssl.CERT_REQUIRED,
                ssl_keyfile='/path/to/key/file',
                ssl_certfile='/path/to/client/cert',
                ssl_ca_certfile='/path/to/trusted/cert/or/chain',
                username='ignite',
                password='ignite',)

client.connect('localhost', 10800)

Note that supplying credentials automatically turns SSL on. This is because sending credentials over an insecure channel is not a best practice and is strongly discouraged. If you still want to use authentication without securing the connnection, simply disable SSL when creating the client object:

client = Client(username='ignite', password='ignite', use_ssl=False)

Authorization

Thin client authorization can be configured in the cluster. Refer to the Authorization page for details.