Blog

Today GridGain™ Systems (GridGain.com), provider of the leading open source In-Memory Computing (IMC) Platform, announced that since its open source announcement on March 3, customer interest in its products has significantly accelerated and product downloads have increased by more than 900 percent in the last four months.

The financial services industry has some of the highest computing demands of all sectors in business. While speed, scalability and real-time analytics are important considerations in any industry, in financial services speed literally translates directly into revenue, and latency into loss.

Today GridGain™ Systems (GridGain.com), provider of the leading open source In-Memory Computing (IMC) Platform, announced that it has been chosen by AlwaysOn as a Global 250 Top Private Company winner.

In-Memory Computing Platform (IMCP) company GridGain Systems has enhanced its In-Memory Accelerator for Hadoop. The technology provides speed and scale with Map/Reduce applications without requiring any code change to native MapReduce, HDFS, and YARN environments.

Today at the Spark Summit 2014, GridGain™ Systems (GridGain.com), provider of the leading open source In-Memory Computing (IMC) Platform, announced that it has enhanced both the performance and ease of deployment of the latest version of the GridGain In-Memory Accelerator for Hadoop®, expanding the benefits of in-memory computing to the Hadoop world.

There’s general agreement that the rise of in-memory databases has created a massive, multibillion-dollar opportunity for the channel. Beyond that, there’s next to no agreement about how that opportunity will actually manifest itself.

Today GridGain™ Systems (GridGain.com),
provider of the leading open source In-Memory Computing (IMC) Platform, announced that it has enhanced the security features on the latest version of its Enterprise Edition product to enable full visibility into all data access points across its platform, and now offers security audit, advanced authentication and fine grained authorization.

If you prefer a video demo with coding examples, visit the original blog post at gridgain.blogspot.com.

Distributed In-Memory Caching generally allows you to replicate or partition your data in memory across your cluster. Memory provides a much faster access to the data, and by utilizing multiple cluster nodes the performance and scalability of the application increases significantly.

Majority of the products that do distributed caching call themselves In-Memory Data Grids. On top of simply providing hash-table-like access to your data, a data grid product should provide some combination of the following features:

  • Clustering
  • Distributed Messaging
  • Distributed Event Notifications
  • Distributed ACID Transactions
  • Distributed Locks
  • Distributed Data Queries, possibly using SQL
  • Distributed Data Structures, like Maps, Queues, Sets, etc.
  • Clustered Web Sessions
  • OR-Mapping Integration, including Hibernate
  • Persistent Database Support, like Oracle, MySQL, etc.

Of course the devil is in the details. For example, given the distributed nature of the cluster anything can fail at any point. So a good question to ask is how the failures are handled, especially what if the failures happen during commit. If during commit a cluster can be left in semi-committed state due to failures, it is definitely a problem.

Another example would be queries. Are the predicate queries being supported? Can you do SQL queries, particularly can the SQL Joins be handled? How are the aggregate functions handled, etc.

Simplicity of APIs is very important as well. ConcurrentMap API has become a de facto standard of accessing data stored in distributed caches, but not all the products support it. Also, a good thing to check would be whether other standard data structures are supported. For example, GridGain supports Map, Set, BlockingQueue, AtomicLong, AtomicSequence, CountDownLatch, all in distributed fashion.

And the last, but not least, always check for performance. Load up the cluster and see what the throughput and latencies are, what is the network load on each server, etc. A good benchmarking tool for testing distributed systems is open source Yardstick Framework, available on GitHub.

For a video demo and coding examples, visit the original blog post at gridgain.blogspot.com.

A few months ago, I spoke at the conference where I explained the difference between caching and an in-memory data grid. Today, having realized that many people are also looking to better understand the difference between two major categories in in-memory computing: In-Memory Database and In-Memory Data Grid, I am sharing the succinct version of my thinking on this topic – thanks to a recent analyst call that helped to put everything in place :)

TL;DR

Skip to conclusion to get the bottom line.

Nomenclature

Let’s clarify the naming and buzzwords first. In-Memory Database (IMDB) is a well-established category name and it is typically used unambiguously.

It is important to note that there is a new crop of traditional databases with serious In-Memory “options”. That includes MS SQL 2014, Oracle’s Exalytics and Exadata, and IBM DB2 with BLU offerings. The line is blurry between these and the new pure In-Memory Databases, and for the simplicity I’ll continue to call them In-Memory Databases.

In-Memory Data Grids (IMDGs) are sometimes (but not very frequently) called In-Memory NoSQL/NewSQL Databases. Although the latter can be more accurate in some case – I am going to use the In-Memory Data Grid term in this article, as it tends to be the more widely used term.

Note that there are also In-Memory Compute Grids and In-Memory Computing Platforms that include or augment many of the features of In-Memory Data Grids and In-Memory Databases.

Confusing, eh? It is… and for consistency – going forward we’ll just use these terms for the two main categories:

  • In-Memory Database
  • In-Memory Data Grid

Tiered Storage

It is also important to nail down what we mean by “In-Memory”. Surprisingly – there’s a lot of confusion here as well as some vendors refer to SSDs, Flash-on-PCI, Memory Channel Storage, and, of course, DRAM as “In-Memory”.

In reality, most vendors support a Tiered Storage Model where some portion of the data is stored in DRAM (the fastest storage but with limited capacity) and then it gets overflown to a verity of flash or disk devices (slower but with more capacity) – so it is rarely a DRAM-only or Flash-only product. However, it’s important to note that most products in both categories are often biased towards mostly DRAM or mostly flash/disk storage in their architecture.

Bottom line is that products vary greatly in what they mean by “In-Memory” but in the end they all have a significant “In-Memory” component.

Technical Differences

It’s easy to start with technical differences between the two categories.

Most In-Memory Databases are your father’s RDBMS that store data “in memory” instead of disk. That’s practically all there’s to it. They provide good SQL support with only a modest list of unsupported SQL features, shipped with ODBC/JDBC drivers and can be used in place of existing RDBMS often without significant changes.

In-Memory Data Grids typically lack full ANSI SQL support but instead provide MPP-based (Massively Parallel Processing) capabilities where data is spread across large cluster of commodity servers and processed in explicitly parallel fashion. The main access pattern is key/value access, MapReduce, various forms of HPC-like processing, and a limited distributed SQL querying and indexing capabilities.

It is important to note that there is a significant crossover from In-Memory Data Grids to In-Memory Databases in terms of SQL support. GridGain, for example, provides pretty serious and constantly growing support for SQL including pluggable indexing, distributed joins optimization, custom SQL functions, etc.

Speed Only vs. Speed + Scalability

One of the crucial differences between In-Memory Data Grids and In-Memory Databases lies in the ability to scale to hundreds and thousands of servers. That is the In-Memory Data Grid’s inherent capability for such scale due to their MPP architecture, and the In-Memory Database’s explicit inability to scale due to fact that SQL joins, in general, cannot be efficiently performed in a distribution context.

It’s one of the dirty secrets of In-Memory Databases: one of their most useful features, SQL joins, is also is their Achilles heel when it comes to scalability. This is the fundamental reason why most existing SQL databases (disk or memory based) are based on vertically scalable SMP (Symmetrical Processing) architecture unlike In-Memory Data Grids that utilize the much more horizontally scalable MPP approach.

It’s important to note that both In-Memory Data Grids and In-Memory Database can achieve similar speed in a local non-distributed context. In the end – they both do all processing in memory.

But only In-Memory Data Grids can natively scale to hundreds and thousands of nodes providing unprecedented scalability and unrivaled throughput.

Replace Database vs. Change Application

Apart from scalability, there is another difference that is important for uses cases where In-Memory Data Grids or In-Memory Database are tasked with speeding up existing systems or applications.

An In-Memory Data Grid always works with an existing database providing a layer of massively distributed in-memory storage and processing between the database and the application. Applications then rely on this layer for super-fast data access and processing. Most In-Memory Data Grids can seamlessly read-through and write-through from and to databases, when necessary, and generally are highly integrated with existing databases.

In exchange – developers need to make some changes to the application to take advantage of these new capabilities. The application no longer “talks” SQL only, but needs to learn how to use MPP, MapReduce or other techniques of data processing.

In-Memory Databases provide almost a mirror opposite picture: they often require replacing your existing database (unless you use one of those In-Memory “options” to temporary boost your database performance) – but will demand significantly less changes to the application itself as it will continue to rely on SQL (albeit a modified dialect of it).

In the end, both approaches have their advantages and disadvantages, and they may often depend in part on organizational policies and politics as much as on their technical merits.

Conclusion

The bottom line should be pretty clear by now.

If you are developing a green-field, brand new system or application the choice is pretty clear in favor of In-Memory Data Grids. You get the best of the two worlds: you get to work with the existing databases in your organization where necessary, and enjoy tremendous performance and scalability benefits of In-Memory Data Grids – both of which are highly integrated.

If you are, however, modernizing your existing enterprise system or application the choice comes down to this:

You will want to use an In-Memory Database if the following applies to you:

  • You can replace or upgrade your existing disk-based RDBMS
  • You cannot make changes to your applications
  • You care about speed, but don’t care as much about scalability

In other words – you boost your application’s speed by replacing or upgrading RDBMS without significantly touching the application itself.

On the other hand, you want to use an In-Memory Data Grid if the following applies to you:

  • You cannot replace your existing disk-based RDBMS
  • You can make changes to (the data access subsystem of) your application
  • You care about speed and especially about scalability, and don’t want to trade one for the other

In other words – with an In-Memory Data Grid you can boost your application’s speed and provide massive scale by tweaking the application, but without making changes to your existing database.

It can be summarized it in the following table:

  In-Memory Data Grid In-Memory Database
Existing Application Changed Unchanged
Existing RDBMS Unchanged Changed or Replaced
Speed Yes Yes
Max. Scalability Yes No

In-memory computing is comprised of two main categories: In-Memory Databases and In-Memory Data Grids. I’d like to delve into the differences between the two groups.

1 2 3 19