Almost any In-Memory Data Grid (IMDG) solution available can be used as-is without an underlying persistent storage layer. Based on my experience, there are different use cases and real production scenarios when the entire data set is fully located in an IMDG and it is not synced to disk at all.
However, in a variety of deployments, companies still prefer to keep data both in memory and on disk. They do this to ensure that the data will not be completely lost if the whole IMDG cluster goes down or needs to be restarted.
Figure 1 below depicts one of the most popular deployments of a cluster running using GridGain In-Memory Data Grid feature. An application or service communicates directly to the cluster and stores data there, executing distributed transactions and SQL queries, sending computations and streaming real-time data sets. But the application never updates a database or reads data from it directly.
It is the responsibility of the GridGain engine to make sure that the database is always up-to-date by syncing all the updates that happen in RAM to disk. It is GridGain's responsibility to preload a missing key's value to RAM from the database, if the key's value is missing. Thanks to the GridGain Cache Store mechanism, this functionality works out of the box, making integration with popular databases like Oracle, MySQL or PostgreSQL trivial.
Database Modification by Many Sources
The configuration shown above would be perfect if this type of deployment was applicable for all the business cases. However, how can you deal with situations where a database is being updated by both the GridGain cluster and also other applications that interact with the database directly, similar to the situation shown in Figure 2 below? How can you guarantee that stale data won't reside on the GridGain cluster for a long time and will be updated as soon as the database receives updates from the other applications?
A straightforward workaround would be to connect App 2 directly to the GridGain cluster rather than to the database. However, this might be impossible in some organizations due to historical reasons. In other cases, applications might be migrated to an in-memory data grid solution step-by-step which means that some applications might already work over GridGain while the remainder are directly integrated with a database.
Another feasible solution is to implement custom logic on the GridGain cluster side. That logic could check the database periodically, get all the updates that have happened recently, and store them in RAM on the cluster side. While this is a realistic solution, it requires a lot of engineering effort because to make sure that only deltas of the updates are retrieved from the database and that the deltas are applied in a transactional fashion to the GridGain cluster.
What other options are available? Is there any alternative that will allow building architectures similar to the one shown in Figure 2? They must guarantee that the GridGain cluster will be updated whenever the database is modified by another source. The answer is yes! Let's look at how Oracle GoldenGate can help us out.
Syncing The GridGain In-Memory Computing Cluster and Database
With Oracle GoldenGate, Figure 2 is transformed into Figure 3 below.
As Oracle says, "Oracle GoldenGate is a comprehensive software package for real-time data integration and replication in heterogeneous IT environments. The product set enables high availability solutions, real-time data integration, transactional change data capture, data replication, transformations, and verification between operational and analytical enterprise systems."
Undoubtedly, this Oracle's statement is true and GridGain provides the integration with this software package making sure that deployments like the one in Figure 3 can be easily brought to life.
The GridGain Enterprise Edition Oracle GoldenGate Integration provides a solution for real-time integration and replication of GoldenGate compatible databases and a GridGain cluster. When GoldenGate integration replication is configured, GridGain will automatically receive updates from the configured GoldenGate source database and will store the updates on the GridGain side like it is depicted in Figure 3 above.
Moreover, Oracle GoldenGate doesn't lock you to Oracle Database. You are free to use one of the databases that are supported by this software package. Among the supported databases you can find MySQL, Microsoft SQL Server, IBM DB2 and more.
I'm not going to cover technical details in this blog post elaborating on how to set up a GridGain cluster with Oracle GoldenGate. This is already perfectly done in this technical documentation section. So, go ahead and try it out! And you will see how the integration can greatly simplify your life.