Exploring Four Unique Key Generation Strategies in GridGain and Apache Ignite

Some databases have a feature where a column can be “auto-incremented.” In this piece, we’ll explain why GridGain and Apache Ignite do not, and what you can do to get equivalent functionality.

 

The naming varies, but the concept is straightforward: the system automatically generates a key for you if there is no unique business key for a table. Typically, this would be a numeric column, and the data would increase the value by one every time you insert a new record. MySQL and MariaDB have specific SQL syntax for “AUTO_INCREMENT”. SQL Server has a special data type called IDENTITY. Oracle used to make you use a sequence and a trigger, but it has made it a bit easier in recent years.

 

GridGain does not have an equivalent syntax. If we see this requirement frequently, why not? The short story is that having a single incrementing value is challenging to implement efficiently in a distributed system. Having to maintain state means that you need to have a single, global counter (or replicate it to all nodes). And anything “global” means that it’s not scalable, either because you have a single source of truth or because you need to surround any “next value” operation with expensive locks.

 

The good news is that several different ways to build an automated key exist. Which is the best for your use case depends on your requirements.

 

Sequences
 

GridGain does have sequences that you can use as your key.

 

var counter1 ignite.atomicSequence("SEQ"0true);
var next1 counter1.getAndIncrement();

var counter2 ignite.atomicLong("SEQ2"0true);
var next2 counter2.getAndIncrement();

 

They are called an atomic sequence and an atomic long (atomic-types). As you can see, they look very similar, but there is a significant difference.

 

The IgniteAtomicLong operates pretty much as I described above: each time you request the next value, it reaches out to a singleton on another node, increments the value, and returns the value back to the originating node. On the plus side, it means that when you call it twice, the second value is guaranteed to be larger than the first. On the negative side, if you have multiple writers, they will end up waiting for each other. Oddly, writing the data to the cluster can be performed entirely in parallel, but generating the next sequence number requires a singleton!

 

If the keys need to be unique but not necessarily continually increasing, you can use the IgniteAtomicSequence. When you get a reference to the sequence, you also get allocated a block of numbers (by default, 1000). In this way, your client only needs to talk to the rest of the cluster when it has exhausted that range. Of course, this means that if you have two clients writing to the cluster, the first will start from zero, and the second will start from a thousand. The numbers will be unique but not consecutive.

 

UUID

 

If your key only needs to be unique, there’s a better way of creating a key: the UUID. A UUID can be made in an entirely distributed manner without talking to any other node in the cluster. GridGain even “understands” UUIDs, meaning that they’re stored efficiently and can be seamlessly converted to a platform’s native format. For example, I can write into my cluster using a Java UUID and then read the same value as a GUID from a .NET client.

 

try (var ds ignite.<UUID,Person>dataStreamer("PERSON")) {
    for (var 01000i++) {
        ds.addData(UUID.randomUUID(),
                Person.builder()
                        .name("Person " i)
                        .height(160 rnd.nextInt(40))
                        .build());
    }
}

 

Using a UUID requires no special GridGain magic, just the standard java.util.UUID class.

 

TSID

 

But what if you need to maintain some kind of ordering and want multiple clients to write to a cluster, and you want it to be distributed?

 

There is an answer to this. There’s a specification for a Universally Unique Lexicographically Sortable Identifier (https://github.com/ulid/spec), and multiple implementations exist. I picked tsid-creator for the sample code below (https://github.com/f4b6a3/tsid-creator). 

 

try (var ds ignite.<String,Person>dataStreamer("PERSON")) {
    var tsidGen new TsidFactory(1);
    for (var 01000i++) {
        ds.addData(tsidGen.create().toString(),
                Person.builder()
                        .name("Person " i)
                        .height(160 rnd.nextInt(40))
                        .build());

    }
}

 

Code-wise, it’s basically identical to the UUID example. The difference is that a TSID can be saved as a long or a string by specifying a node id (the “1” in the constructor), there can’t be clashes between clients, and it’s possible to work backward to get a timestamp.

 

Sortable, unique, and distributed. What’s not to like?

 

Conclusion

 

In this piece, we’ve seen four ways to create unique keys you can use in your application. Each has different strengths and weaknesses, and all are valid approaches that we’ve seen with various clients. Let us know if you have found any other promising approaches.