In-Memory Database vs. In-Memory Data Grid

In-Memory Database vs. In-Memory Data Grid

This article explains the difference between two major in-memory computing categories: In-Memory Database and In-Memory Data Grid.

Technical Differences

Most In-Memory Databases are RDBMSs that store data “in memory” in addition to on disk. That is practically all there is to it. They provide good SQL support with only a modest list of unsupported SQL features, ship with ODBC/JDBC drivers, and can be used in place of existing RDBMSs often without significant changes.

In-Memory Data Grids often lack full ANSI SQL support. They provide MPP-based (Massively Parallel Processing) capabilities where data is spread across a large cluster of commodity servers and processed in parallel fashion. The main access pattern is key/value access, MapReduce, various forms of HPC-like processing, or a distributed SQL querying and indexing capabilities.

It is important to note that there is generally a significant crossover from In-Memory Data Grids to In-Memory Databases in terms of SQL support. Although, the GridGain in-memory data grid provides ANSI-99 SQL support, which is a major difference from other in-memory data grids.

Speed Only vs. Speed + Scalability

One of the crucial differences between In-Memory Data Grids and In-Memory Databases lies in the ability to scale to hundreds and thousands of servers. That is the In-Memory Data Grid’s inherent capability for such scale due to their MPP architecture, and the In-Memory Database’s explicit inability to scale due to the fact that SQL joins, in general, cannot be efficiently performed in a distributed context.

It's one of the dirty secrets of In-Memory Databases: one of their most useful features, SQL joins, is also their Achilles heel when it comes to scalability. This is the fundamental reason why most existing SQL databases (disk or memory based) are based on vertically scalable SMP (Symmetrical Processing) architectures, unlike In-Memory Data Grids that utilize the much more horizontally scalable MPP approach.

It’s important to note that both In-Memory Data Grids and In-Memory Databases can achieve similar speed in a local non-distributed context. In the end, they both do all processing in memory.

But only In-Memory Data Grids can natively scale to hundreds and thousands of nodes, providing unprecedented scalability and unrivaled throughput.

Replace Database vs. Change Application

Apart from scalability, there is another difference that is important for uses cases where In-Memory Data Grids or In-Memory Databases are tasked with speeding up existing systems or applications.

An In-Memory Data Grid always works with an existing database, providing a layer of massively distributed in-memory storage and processing between the database and the application. Applications then rely on this in-memory computing layer for super-fast data access and processing. Most In-Memory Data Grids can seamlessly read-through and write-through from and to databases, when necessary, and generally are highly integrated with all the popular databases.

In exchange - developers need to make some changes to the application to take advantage of these new capabilities. The application may no longer “talk” SQL only, but needs to learn how to use MPP, MapReduce or other techniques of data processing. An exception here is the GridGain in-memory data grid which includes SQL support through an ODBC/JDBC API, allowing users to continue to use SQL to talk to the data grid with minimal changes to their application.

In-Memory Databases provide almost a mirror opposite picture: they often require replacing your existing database (unless you use one of those in-memory “options” to temporarily boost your database performance) – but will demand significantly less changes to the application itself as it will continue to rely on SQL (albeit a modified dialect of it).

In the end, both approaches have their advantages and disadvantages, and they may often depend in part on organizational policies and politics as much as on their technical merits.

Conclusion

If you need to add speed and scalability to an existing application, the choice is pretty clear in favor of an In-Memory Data Grid. You get the best of the two worlds: work with the existing databases in your organization where necessary, and enjoy the tremendous performance and scalability benefits of In-Memory Data Grids – both of which are highly integrated.

If you are, however, creating a new system or modernizing your existing enterprise system or application, the choice comes down to this:

You will want to use an In-Memory Database if the following applies to you:

  • You can replace or upgrade your existing disk-based RDBMS
  • You cannot make changes to your applications
  • You care about speed, but don’t care as much about scalability

In other words -- you boost your application’s speed by replacing or upgrading your RDBMS without significantly touching the application itself.

On the other hand, you want to use an In-Memory Data Grid if the following applies to you:

  • You cannot replace your existing disk-based RDBMS
  • You can make changes to (the data access subsystem of) your application
  • You care about speed and especially about scalability and don’t want to trade one for the other

In other words – with an In-Memory Data Grid you can boost your application’s speed and provide massive scalability by tweaking the application, but without ripping and replacing your existing database.

It can be summarized it in the following table:

  In-Memory Data Grid In-Memory Database
Existing Application Possibly Changed Unchanged
Existing RDBMS Unchanged Replaced
Speed Yes Yes
Maximum Scalability Yes No