For more than two decades, in-memory technology has transformed itself multiple times. You can't read about it in a canonical way without getting confused. Is it database caching or a data grid? Is it streaming or acceleration? And what about computing platforms and data fabrics?
Let's discuss why we have so many different terminologies for types of in-memory computing, where they came from, and how you can unify them under an in-memory computing platform. We'll take you through the evolution of this technology, starting at the beginning: with in-memory database caching.
- In-Memory Database Caching
- In-Memory Data Grid
- In-Memory Databases
- In-Memory Streaming
- In-Memory Accelerators
- In-Memory Computing Platforms
In-Memory Database Caching
Fundamentally, in-memory computing was born in the late eighties and early nineties. And originally, it was developed as database caching.
What is database caching? Imagine a typical enterprise application such as one for the insurance industry. An insurance agent sits in front of their computer entering information. They press a button, and a policy proposal for a US customer comes back to them. The agent's request goes from their browser (or their "terminal application," back in the day) to an application layer on their company's servers. The application layer is where the logic of the request is written, using a programming language. This application layer needs to get some data to work and the data is stored in a database located on another set of servers, potentially in a different company's data center. This is a "three-tier architecture" and even today, it's common of most enterprise applications. You have a browser that you interact with, an application where the logic is written, and a database where data is stored.
As explained earlier, traditionally every time you needed a piece of data, you had to fetch it from a database. Back in the seventies and eighties, that worked fine - the amount of data was so small that the system could handle it. But as we moved into the nineties, things changed. We started to realize, "We must constantly fetch data from the database over the network to the application layer, discard it almost immediately after we're done processing it, and repeat the whole process over and over again for each request. This is clearly becoming a real efficiency bottleneck."
So, people thought, "Why don't we just keep the most frequently accessed data in the application layer so we don't have to go and fetch it every time?" That's what caching is. It's keeping your most frequently used data in the application layer so it's much closer to you than it would be in your database. And you keep that data in memory, because that's where the application is running.
That's how the very first application of in-memory computing came about, more than two decades ago. And even though we've grown to use other forms of caching, database caching is still one of the dominant use cases for in-memory computing. From a technology perspective, the practice of keeping frequently accessed data closer to you is used everywhere in computer science, not only in databases.
We even do this ourselves. We keep things closer when we know we need to access them more quickly and more often. Database caching is like having your most frequently used groceries with you in the kitchen while you drive to the grocery store for items you use infrequently.
The first cache systems were very primitive. You'd ask an engineer to set one up, and he'd say, "I can do it in a week." But then caches started to grow in complexity. People started asking, "What if I have multiple servers in my application layer? How do I keep the data fully in sync for transactions?" or "What if I keep my data in memory and my server crashes? Do I lose the data I was keeping in cache? How do I protect it?" And so on. By the early 2000s, the complexity of database caching had grown so much that we needed a new way to keep everything organized. The solution that emerged was called an in-memory data grid.
In-Memory Data Grid
The transition from database caching to in-memory data grids was a key step in the evolution of in-memory computing. Database caching alone worked as long as you were only using it for a few specific applications. But once you started using it in a broader way, people began asking for a lot of new features: transactions, reliability, distribution, splitting the load among servers in a cluster, and so on. Data grids came on the scene as an answer to these complex, technical questions.
The term "data grid" denotes a cluster of servers that can hold a large set of data across those multiple servers. Fundamentally, it works just like database caching. The difference is that it allows you to keep data in memory across multiple servers in a very sophisticated way, so it can scale much better and handle tremendous complexity.
In the late 2000s, another player came on the scene: in-memory databases. In-memory databases didn't bring much of a technological advantage to the table. They were more of a reactionary development in the industry. Even though data grids were incredible from a technology standpoint, they weren't easy to use. Users didn't know how to deal with them. They were comfortable and familiar with what they had used in the past: SQL- based databases.
So a few of the more traditionally minded folks in the industry thought, "Okay, we've been doing business with disks for the past twenty-five years, and we're outdated. This new in-memory technology has become so cheap and available that there has to be a way to use it. Why don't we make a faster version of databases by using memory as the primary medium of storage instead of disks?"
This is how all the startup companies in the in-memory database space came about. It's also why giants like Oracle, IBM, and Microsoft were able to develop an in-memory option for their database products.
But in-memory databases aren't the cutting edge of this technology. They're faster than old disk-based databases, but they're slower than data grids. Their main appeal is that users are more comfortable working with a database system because it's familiar.
You can think of this in terms of a chef. He's facing an explosion of demand and he knows he needs to change the way he works in order to deal with it. One option is to set up a very complex infrastructure of fridges and keep them full with the necessary items. But he's not used to that. He's used to having one fridge and filling it up with new items from the grocery store (a database) when he needs to.
Then a grocery store chain comes to him and says, "You don't have to put in this complicated refrigerator system. We can sell you a local store, just like you're used to. We can install it right on your block. It won't be as fast as the refrigerator system so you'll still have to walk a little bit. But you won't have to get in your car and drive 15 minutes to the store and back every time you need an ingredient, so it will still be a lot better than what you had before. Most importantly, you already know how to use it. You've been doing it for years!"
And that approach works because the chef says, "That's a good idea. I don't want to deal with this new infrastructure of refrigerators. I've been using the grocery store my whole life. I don't want to learn anything new. If you guys can build a store on my block, then why not? True, the price gives me sticker shock, but I'll take it."
In-memory databases aren't as fast or as convenient as other types of in-memory technology. But their learning curve is almost zero. For some people, that makes all the difference.
So far, we've seen three main uses for in-memory computing: database caching, in-memory data grids, and in- memory databases. Along the way, these evolved into a variety of different directions. One of those directions is in- memory streaming.
In-memory streaming is also called in-memory complex event processing in some circles. The whole streaming process is a new way to use in-memory computing. Streaming is categorized by a huge, endless flood of information. It came to prominence in the early 2010s, with the explosive growth of the Internet of Things, wearable devices, sensory data, and machine-to-machine data generation.
For example, imagine you're using a database and someone purchases something from you online. You take that request and process it, no big deal. But it's another story when you have tens or hundreds of thousands of automatic devices (sensors) constantly sending data to you at the same time. You can't process them one by one. There's physically no way to do that. So you have to take a very different approach. You have to handle it in memory.
In-memory streaming is still a form of data processing. But instead of processing stationary data from a database, in-memory streaming processes data that's in motion. This is the type of data payload where you have an extremely large amount of data pouring into your system each second, endlessly. And that kind of demand takes in-memory technology to a whole new level.
We can connect this back to our chef. He's still cooking the same food but now his operations model is different. Even with a new infrastructure of refrigerators installed, he can't prepare his dishes one by one anymore. There's too much demand. He has to find a new, faster, and more agile way of making the food. So he starts preparing large batches of food in advance and he stores and organizes those dishes in the freezer. Then, when the requests pour in, he can fill the orders quickly because a lot of pre-packaged, pre-stored, and pre-sorted food is in stock.
Streaming works in a similar way. You're not dealing with each pickle and tomato individually anymore. That's not physically possible. Instead, you group things together in order to deal with the incoming flow. For example, let's say you have a sliding window of the last five minutes of events. You concentrate your processing power on that window of events, constantly aggregating, pre-processing, and pre-sorting its data. The in-memory aspect of that streaming processing is what makes it possible.
There is literally no end to in-memory streaming. More and more devices are added to the global network every day. Watches and cars are already connected to the Internet. Not only that, but devices will become more complicated as technology advances. That's why streaming will never end. The demand will only increase as civilization continues to evolve.
In-memory technology is very advanced but in-memory streaming is still fighting to keep up with demand. One of the largest trucking companies in the world spoke with us about installing dozens of sensors in each of their trucks globally. The sensors would send information about the wear and tear on each part of the engine in each of the trucks. If they can tell when a part is about to break, they can let that truck driver know before it breaks down and tell them how to get to the nearest service station. But they ran the calculations and found that the amount of data streaming in would easily consume all the processing power they have. They didn't have the processing power to sustain that kind of load and they were searching for hardware and software solutions to help them launch the project. That is just one example of how, even today, streaming can be a real challenge.
One of the latest and most logical evolutions of these various types of in-memory products is in-memory accelerators. We explained how in-memory technology started with database caching, then matured into data grids, in-memory databases, and in-memory streaming.
In the early 2010s, a new trend started to appear. People began to say, "In-memory computing is pretty cool and obviously very important but it's so complex. Can there be some kind of 'plug and play' acceleration that makes it easier for us to use in-memory technology with our systems and products?" That line of thinking comes from a basic assumption: as in-memory computing becomes more mainstream, it needs to be simplified. We can no longer rely on top-notch engineers and PhDs to use this technology. It needs to be a hands-on experience for the average user. That's where in-memory accelerators come into the picture.
In-memory accelerators give people the "plug and play" capability they're looking for with in-memory technology. They combine the performance and scalability of in-memory computing with a dramatically simplified method of usage. You install something, and suddenly everything just gets faster. This is what the larger market has been awaiting.
Look at it through the eyes of our chef, or better, through the eyes of the company trying to sell him on a new infrastructure of refrigerators. The company that makes the refrigerator systems suddenly realizes, "We're losing customers to grocery stores just because the grocery store is promising chefs an easy transition. Our business model is all wrong. We can't ask chefs to learn and develop a completely new process on how they operate and cook their food. We need to give them an option where they can just sign up and push a button and suddenly they're cooking a lot faster."
This is what in-memory computing vendors are developing today. They're creating the technology that can inject the high performance of in-memory computing into a system without asking the user to significantly redevelop that system, all while providing the end user with an easy-to-use and familiar paradigm for processing data. The chef still gets his complex refrigerator infrastructure but it looks like a grocery store right in his kitchen, just like he used to have. And everybody's happy.
In-Memory Computing Platforms
If you are seriously looking into in-memory computing, you should look at the ultimate form of this technology: in-memory computing platforms.
How do platforms fit into the picture?
Here's an example. Large companies have dozens and dozens of projects. Some of them use database caching. Some of them use data grids. Some of them use streaming, and some of them use accelerator products. Major data centers don't want to deal with each of those systems individually. They want a single platform that includes and supports them all. They want a strategic approach to in-memory computing, and that solution is an in-memory computing platform.
Using an in-memory computing platform is natural and logical and can reduce development time and costs. By 2020, most applications will be supported by in-memory computing. With something that large revolutionizing the landscape, you need to take a strategic view. People don't want to buy individual point solutions. They want a comprehensive strategic platform for this technology.
To return to the chef, an in-memory computing platform is like the fridge vendor saying, "You can build this new system however you want. You want a grocery store outside your kitchen, you can have it. You want a grocery store in your kitchen, you can have that, too. And if you want some parts of the store in your kitchen and some parts outside of it, you can even have it both ways. You can build your new kitchen any way you like and you'll still get the same benefits of speed and convenience with a seamless experience. You don't even have to deal with dozens of vendors to handle the different pieces. We can give you a simple tool to manage it all."
One real-life example of a company that benefits from an in-memory computing platform is one of the largest banks in the world. This bank has one of the largest installments of in-memory computing ever built. In 2014, they used in-memory technology for eighty different applications within the bank. They started with a data grid and then strategically decided to move away from that and towards an in-memory computing platform because it made sense for them as a global company. They didn't want to worry about coordinating ten different vendors and training employees on multiple product technologies. They wanted an in-memory computing platform because it included support for all the different uses of the technology they have today and will have in the future.
The old system of picking a specific vendor for a specific use case was prudent at the turn of the decade, around 2010. This major bank client only had five or ten applications to deal with then. But as the number of applications grew with enormous demand, they realized they couldn't play the vendor-by-vendor game anymore. An in-memory computing platform was the comprehensive solution to their problem. It solved that problem for other companies of similar size as well.
Platforms are the latest advancement in the industry today. Any analyst will tell you so. They see their emergence as a comprehensive solution to the piecemeal approach in-memory computing has taken in the past.
In-memory computing platforms are the preferred approach for most companies, moving forward. The user of the technology gets a single product with a single learning curve, a single configuration, and unified management and monitoring. This kind of streamlined technology solution allows users to harness the speed of memory in ways never before possible and the first companies to get there are outsmarting their competition in the process.