Distributed data structures: Part 1 (overview)

November 21, 2017

Long ago, when Trees computers were large, and processors were single-core, all applications were started in one thread and did not experience synchronization difficulties. Modern applications, however, tend to use all available resources, in particular, all available CPUs.

Unfortunately, it is not possible to use standard data structures for multithreaded processing, so Java 5 introduces thread-safe data structures, i.e. functioning correctly when used from multiple threads at the same time, and they are located in the package java.util.concurrent.

However, despite all the technological strength inherent in the java.util.concurrent package, processing of information by thread-safe collections is possible only within the limits of one computer, and this gives rise to the problem of scalability.

And what if it is necessary, in real time, to process information about 100 million customers,
when the data center occupies 100TB, and every second needs to commit 100+ thousand operations? It is hardly possible, even on the coolest modern hardware, and if possible - just imagine its price!

It is much cheaper to achieve the same processing power by combining many ordinary computers into a cluster.

It remains only a question of intercomputer interaction by familiar means, similar to the API with thread-safe collections from the package java.util.concurrent and giving the same guarantees, but not on one computer, but on the whole cluster.

Such opportunities and guarantees can give distributed data structures.

Let's consider some of the distributed data structures, allowing, without special complications, to make a distributed thread from a multithreaded algorithm.

AtomicReference and AtomicLong

IgniteAtomicReference provides compare-and-set semantics.

Suppose there are 2 machines available in our network.

Run Apache Ignite instance on both of them doing the following:

//Running an Apache Ignite instance with a default configuration.
Ignite ignite = Ignition.ignite();

// Creating or accessing an already created IgniteAtomicReference.
IgniteAtomicReference<String> ref = ignite.atomicReference("refName", "someVal", true);

Now try to change a value held by the atomic reference on both machines:

// Change the value if the current version is 'someVal'.
boolean res = ref.compareAndSet("someVal", "someNewVal"); 

// The first Ignite node will be able to change the value while the second
// will fail because of the already update value.

Restore the changed value:

ref.compareAndSet("someNewVal", "someVal"); // Will be executed succesfully.

IgniteAtomicLong expands the semantics of IgniteAtomicReference by adding atomic increment / decrement operations:

// Creating or accessing an already created IgniteAtomicLong
final IgniteAtomicLong atomicLong = ignite.atomicLong("atomicName", 0, true);

// Incrementing and printing out the current value.
System.out.println("Incremented value: " + atomicLong.incrementAndGet());

Detailed documentation: https://apacheignite.readme.io/docs/atomic-types

Examples on github:

AtomicSequence

IgniteAtomicSequence allows you to obtain a unique identifier, with uniqueness guaranteed within the entire cluster.

IgniteAtomicSequence works faster than IgniteAtomicLong , because Instead of being synchronized globally on the receipt of each identifier, it immediately receives a range of values and then issues identifiers from this range.

// Creating or accessing an already created IgniteAtomicSequence.
final IgniteAtomicSequence seq = ignite.atomicSequence("seqName", 0, true);

// Getting 20 unique identifiers.
for (int i = 0; i < 20; i++) {
  long currentValue = seq.get();
  long newValue = seq.incrementAndGet();  
  ...
}

Detailed documentation: https://apacheignite.readme.io/docs/id-generator
Example on github - IgniteAtomicSequenceExample

CountDownLatch

IgniteCountDownLatch allows you to synchronize threads on different computers within a single cluster.

Run the following code on 10 machines of the same cluster:

// Creating or accessing an already created IgniteCountDownLatch
// setting the counter initial value to 10.
IgniteCountDownLatch latch = ignite.countDownLatch("latchName", 10, false, true);

// Decrementing the counter.
latch.countDown();

// Waiting while the countDown is called 10 times.
latch.await();

As a result, all latch.await () will be unblocked only after ten calls of latch.countDown () are completed.

Detailed documentation: https://apacheignite.readme.io/docs/countdownlatch
Example on github - IgniteCountDownLatchExample

Semaphore

IgniteSemaphore allows you to limit the number of simultaneous actions within a single cluster.

// Creating or accessing an already created IgniteSemaphore
// setting the initial value to 20
IgniteSemaphore semaphore = ignite.semaphore("semName", 20,  true,  true);

// Acquiring the semaphore.
semaphore.acquire();

try {
    // The semaphore is locked. It's safe to execute this section.
}
finally {
    // Releasing the semaphore.
    semaphore.release();
}

It is guaranteed that, simultaneously, no more than 20 threads, within the same cluster, will execute the code inside the try-finally section .

Detailed documentation: https://apacheignite.readme.io/docs/distributed-semaphore
Example on github - IgniteSemaphoreExample

BlockingQueue

IgniteQueue provides the same capabilities as BlockingQueue , but within a whole cluster.

// Creating or accessing an already created IgniteQueue.
IgniteQueue<String> queue = ignite.queue("queueName", 0, colCfg);

Try to get the item from the queue

// Taking the first queued element.
queue.take();

Once the queue is empty the code will be blocked on queue.take () until a new element is queued:

// Queueing up a new object.
queue.put("data");

Detailed documentation: https://apacheignite.readme.io/docs/queue-and-set
Example on github - IgniteQueueExample

That was a high-level overview of existing capabilities. In the next blog post, you will learn how some data structures reviewed above are implemented.