In last week's post, "Distributed data structures: Part 1 (overview),"I talked about why you need distributed data structures (hereinafter - RSD) and disassembled several options offered by the distributed cache Apache Ignite. Today I want to talk about the details of the implementation of specific RSD, as well as a small educational program on distributed caches. To begin with, at least in the case of Apache Ignite, RDBs are not implemented from scratch, but are a superstructure over a distributed cache.
To support the operation of the RSD, two caches are used: one is Replicated and one is Partitioned.
Replicated-cache - in this case it is a system cache, (
ignite-sys-cache) responsible, including, for storing information about RSD registered in the system.
ignite-atomics-sys-cache) stores the data necessary for the operation of the RSD, and their state.
So, most of the RNC is created as follows:
- The transaction starts.
ignite-sys-cache, by the key
DATA_STRUCTURES_KEY, it is taken
Map<Имя_РСД, DataStructureInfo>(it is created if necessary), and a new element with a description, for example, is added to it
ignite-atomics-sys-cache, by the key from the added previously
DataStructureInfoadded the element responsible for the state of the RSD.
- The transaction commits.
At the first request for creating an RSD, a new instance is created, and subsequent requests are received previously created.
The third step of initialization for both types is to add to
ignite-atomics-sys-cacheobjects of type
Both classes contain one single field
Accordingly, any change
... this is the start
EntryProcessorwith the following method code
IgniteAtomicLongis a de facto extension
IgniteAtomicReference, so its method is
compareAndSetimplemented in a similar way.
incrementAndGetdoes not have checks on the expected value, but simply adds one.
When you create each instance
... it is allocated a pool of identifiers.
Accordingly, the call ...
... just increments the local counter until it reaches the upper limit of the value pool.
When the boundary is reached, a new identifier pool is allocated, similar to the way it happens when a new instance is created
... is implemented as follows:
Waiting for decrementing the counter to 0 ...
... is implemented through the mechanism of Continuous Queries , that is, each change
GridCacheCountDownLatchValuein the cache, all instances are
IgniteCountDownLatchnotified of these changes.
IgniteCountDownLatchhas a local:
Each notification decrements
internalLatchto the current value. Therefore, it
latch.await()is very simple:
... occurs as follows:
Returning permission ...
... occurs in a similar manner, except that the new value is greater than the current value.
Unlike other RSD,
IgniteQueuedoes not use
ignite-atomics-sys-cache. The cache used is described via the parameter
Depending on the specified Atomicity Mode (TRANSACTIONAL, ATOMIC), you can get different options
In both cases, the state is
To add an element, use
... which, in fact, simply moves the pointer to the tail of the queue.
... a new element is added to the queue:
Deleting an element is similar, but the pointer changes not to
tail, but to
... and the item is deleted.
The difference between
GridAtomicCacheQueueImplwhen adding an element, first incrementally incrementally
hdr.tail(), and then adds the element to the cache by the resulting index.
GridTransactionalCacheQueueImplmakes both actions within the same transaction.
As a result, it
GridAtomicCacheQueueImplworks faster, but the data consistency problem may arise: if information about the size of the queue and the data itself is not stored at the same time, then they can not be deducted simultaneously.
It is quite possible that inside the method
pollit is seen that the queue contains new elements, but the elements themselves are not yet there. This is extremely rare, but still possible.
This problem is solved by the timeout value waiting.
I would like to note once again that a distributed cache is, in essence, ConcurrentHashMap within a set of computers that are clustered together.
Distributed caches can be used to implement many important, complex but reliable systems.
A particular case of implementation is distributed data structures, but in general they are used to store and process huge amounts of data in real time, with the possibility of increasing the volumes or speed of processing by simply adding new nodes.