The Benefits of a Converged Data Platform: Comparing Apache® Ignite™ and Hazelcast
Apache® Ignite™ is an open source solution for workloads that can benefit from a converged data platform. Apache Ignite is the world’s only in-memory computing data fabric. It delivers a broad set of features including an ANSI 99 SQL compliant data grid, a streaming grid, a compute grid, and a service grid. Ignite allows users to massively scale out applications and dramatically increase responsiveness without replacing their existing databases. Although Apache Ignite is a broad solution which addresses many important workloads, it also outperforms point solutions for those same workloads. One case in point is Apache Ignite vs. Hazelcast.
In this one hour webinar, Dmitriy Setrakyan, Chairman of the Apache Ignite Management Committee, will demonstrate how Apache Ignite consistently outperforms Hazelcast on various atomic and transactional cache operations and SQL-based queries as measured on both AWS EC2 and Yardstick Configurations. He will also discuss the features on which only Apache Ignite provides benchmarks because Hazelcast does not provide the functionality.
Founder & CPO, GridGain Systems
Dmitriy Setrakyan:
Hello, everyone. My name is Dmitriy Setrakyan. I run Product Engineering at GridGain and I also am one of the founders of GridGain. I also am a chair of project management committee on Apache Ignite, so I’m very active in open source community and specifically in Apache Ignite community. And we welcome everyone to join this community.
It’s one of the fastest projects and probably most active projects in Apache software foundation. First, I’m going to give a brief Ignite overview. I am gonna talk about components available in Ignite. Then I’ll talk about data – I’ll give a brief data grid overview as well and the Ignite actually has more components. It has a collection of components, and data grid is just one of the components, but given that we are actually comparing features to another data grid vendor, I thought that I’d just go over once again just data grid features. And then we’ll talk about Hazelcast replication strategies versus Ignite replication strategies, some differences in off-heap memory, distributed queries, distributed transactions, a look at the Ignite roadmap, and then talk about some benchmarks that we have available on the website and that you can always take a look at.
Also available on the cloud and free for everyone to run and execute. So let’s start from actually looking at what is Ignite? So Ignite is a memory data fabric. Essentially what a memory data fabric is, it’s a collection of components. All of those components reside in memory.
All of them are available to solve performance and scalability of your application, but all of them are independent. All of them have their own EPS and you never have to use all of them at once. Essentially, you probably will use one or two, maybe three, and as your use case grows, you actually will – I mean you can actually pick up and start using other functionality as well. So data grid is probably one of the biggest components we provide, and data grid has also the distribution of data in memory and caching the data in memory. It also has or usually provides – in Ignite it provides coding capabilities and transactional capabilities on that data.
So Ignite is fully consistent and you will never get dirty read from any of the nodes within Ignite or within Ignite cluster at any time, regardless of the failures. Compute grid is actually a component that is another component available in data fabric and it actually has a lot of functionality to let you compute on single or multiple keys within the data grid or actually it’s not linked to data grid in any way, so you can use compute grid without any data at all. So if you just have compute-only use case, you can go ahead and utilize Ignite computational capabilities and start distributing lanterns and closures across the cluster. We have pretty cool deployment functionalities, so you don’t even have to worry about deployment when using compute grid. The code gets automatically deployed within the cluster as well.
So those two, probably on the transactional side, those two are probably the most popular components. We also provide service grid and streaming capabilities. There’s a big portion of Ignite dedicated to analytical use case. Ignite comes with its own file system, in-memory file system, which is actually built on top of a data grid and its Hadoop compliant. So it’s actually you can plug it into any head of deployment in a plug and play fashion and you don’t have to change code and now you have Hadoop can run on top of in-memory file system.
Ignite also provides a pretty cool integration with Spark. Since Ignite allows you to share state in memory, that’s the capability it adds to Spark use cases. Spark does not allow you to – Spark does not have any data sharing capabilities and Ignite also does a very good job at accelerating Spark SQL, so if you’re around SQL, using Ignite or these, it will run a lot faster simply because Ignite provides indexing capabilities. And we also have other components like messaging events and data structures, but today mainly we will be talking about data grid. Before I actually move forward, and before I start talking about differences with Hazelcast and various features of a data grid, I just wanna mention that we have a very active web community, very active dev lists, very active user lists in Apache Ignite.
We always welcome contributions. There’s a lot of interesting work available within projects, so if you’re interested in writing distributed code, learning how to work with cutting edge technology and implementing your own logs in a distributed fashion, implementing your own transactional logic in a distributed fashion, et cetera, et cetera, very complex, but yet very interesting code. Please do join us. We have information on the website, how to contribute to Apache Ignite. So we have a whole range of tickets listed there from very simple ones to very complex ones depending on your appetite and willingness to learn, interest to learn.
All right, so let’s take a look at a data grid. So what is a data grid? Well, probably the most simplistic description of a data grid is that it is a distributed in memory key value store. So essentially most data grid – some data. I shouldn’t say, “Most.” Some data grids would provide JCache compliance. Both Ignite and Hazelcast are actually JCache (JSR 107) compliant and what JCache provides you is ability to work with data using something similar to concurrent map API in Java. It’s not exactly concurrent map API, but it’s something very similar to it.
So you have ability to perform many concurrent operations on the data. You have boots, boots of absence, removes, et cetera. You also have pluggable persistence. You can write through to your underlying database or you can read through from that underlying database automatically, and it also allows you – JCache also would allow you to collocate your processing with the data. So you can actually run an entry processor, what they call it, but it’s essentially just a piece of logic associated with processing certain key/value pair stored in cache, and it will be processed exactly on the node where the data resides.
So again, data grid is a distributed key value store. However, the distribution models are different across different vendors. I think most vendors would provide partition model, so in partition model, and both Ignite and Hazelcast do support partition model to distribute data in memory and in this model, you actually have a concept of data ownership. Every node owns its own piece of data and collectively across all the nodes, you’re actually caching the whole data set. So in this node, the more nodes you have on a cluster, the more data you can have, you can cache, because the more memory you have. So that’s probably one of those most common ways to distribute data. I’ll talk about others on the following slides.
So on top of JCache capability, most of the data grids will add some capabilities and some transactional capabilities, and both of those of course have to work in a fault torrent and scalable fashion. As I mentioned, at no point in Ignite you will get a dirty read. Ignite is always fully consistent and is fully ACID compliant from a transactional standpoint and has a virtually unlimited linear scale. I mean I’ve seen clusters of 1,000 plus nodes running off of Apache Ignite and scaling closely and providing close to linear scale.
So let’s take a look at some differences between Hazelcast and Ignite. Both products actually have data grid capability and there is a lot of similarities between the two. Both support JCache standard. Both support partitioning on replicated nodes. Both support transactions, data structures, coding capabilities, so the difference is not apparent. It may not seem apparently when you just do a cursory overview of the product, but there are a lot of differences, and some of them I’ll cover today. So as they say, the devil is in the details here, and we’ll go over some details, all these features.
So let’s start with replication strategies. As I mentioned there, partitions like in data grid, the most common strategy is partitioned data, and with partitioned data you actually have a concept of data ownership and every node owns its own piece of data and data is partitioned across the cluster. So there’s also another mode of how you often replicate it, is using full replication strategy. It’s essentially one. The data is copied to every node in the cluster, so every node would hold the same amount of data and this mode is actually not every performant from.
Updates are very costly because you have to replicate every update to every node in the cluster, but it has its own use. Essentially if you’re familiar with Star schema and if you’re familiar with facts and dimensions, then you probably know that facts usually are – is a fast growing data set and it’s usually most well suited for partitioned distribution model. So data set is growing very fast. The data set is large. We just partition it across the data grid, but there are also dimensions in Star schema and dimensions are the data sets that are not often updated, that maybe are storing reference data, lookup data, or configuration data, but those data sets are queried a lot.
So those data sets like data – this type of data is actually often more suited for replicated caches. Updates are very infrequent, but reads are very frequent and in Ignite, you actually have the same API and the same transactional capabilities across both of those data sets, both transactional data set and replicated data set. And you can actually transact across those two different types. So you can actually have a transaction that spans partition and replicated cache and the same transaction can do updates to span both of those caches. And actually, transaction can span multiple caches in Ignite.
You can also have queries and joins between these caches and it’s actually very cool because now you have joining partition caches with replicated caches in a distributed form, in a distributed environment, and those queries work lightning fast, especially given that Ignite provides in-memory indexes for its queries. The queries run lightning fast and distributed joins in this case always work. Ignite also provides ability for joining partition and partition data sets together, but in that case you recommend that you collocate data, collocate partitions together for querying for totally better performance. Otherwise you’ll get into data shuffling issue, and it may actually slow you down.
So on Hazelcast, the difference here would be that Hazelcast provides strong consistency for partition caches. For replication caches, I don’t think it provides – it provides weak consistency or I mean if you read through it, I don’t think there are guarantees for consistency there. So replicated caches are not consistent on the Hazelcast side.
This diagram actually shows, in the context of Ignite, how replicated caches are different from partitioned caches. The diagram on the right actually demonstrates how partitioned data gets distributed within the cache, and if you can see, I mean we have a small data set here just for the purpose of this example, and we have four keys, A, B, C, D, and all the keys are equally distributed across four JVM’s.
The backups for these keys are also equally distributed across JVM’s, but on different JVM’s. Know that there’s no notion of a backup server. We have primary set and backup set. In this case, we have one backup copy. You can have as many as you like. It’s a configuration parameter.
In this example we just have one and the backups are evenly distributed, so why did we choose that approach versus the backup server? The reason is that if you have any crash within the system, backup copies have to have some primary responsibility now, and if you have them equally distributed within the cluster, then the load on this cluster remains equally distributed across multiple nodes. So if you look at the access pattern here, I mean there are two ways you can access the data. One is locally from the same server, which caches the data, and then others remotely from a remote client. In either case, if data is available locally, it will be immediately returned to the user. If it’s not, it will be fetched from the server that owns that data.
On the client side, we also have a notion of a near cache. A near cache as in Ignite are always – are also fully ACID compliant and always consistent. So if an update happens on any of the servers, all the data in near caches will be either updated or invalidated depending on your configuration. So again, if you’re accessing data from a client, if data’s available in your cache, in the near cache, it’ll be returned immediately. Otherwise it’ll be fetched from the primary owner, loaded and stored in the near cache and then returned to the user.
So if you look on the left side, it shows you how replicated caches work and a replicated cache actually in Ignite – it’s a different project implemented differently. I’m not sure how Hazelcast implements it, but in Ignite, a replicated cache essentially is an edge case or a special case for a partition cache and essentially a replicated cache is also a partitioned cache, but now it has as many backup copies as there are nodes in the cluster. So by ensuring that guarantee that every node has a backup copy, we actually ensure that every node has a complete data set, total data set, primary, and all the backups. So everything else remains the same.
The difference here probably would be that whenever you’re accessing data, you would configure to read from backup as well so you don’t have to fetch it from remote nodes if it’s not available locally. So that’s actually the main difference between partition and replicated caches. In ideal deployment you would probably have a combination of the two, maybe, like, 20 or 30 partitioned caches, maybe 10 replicated caches, and again, the beauty of Ignite is that it allows you to transact between these caches and it also allows you to query and join, do distributed joins between these two caches.
So another feature worth mentioning is off-heap memory. So both products, Apache Ignite and Hazelcast, provide off-heap memory. So essentially if you ever worked with Java and if you ever faced garbage collection issues, you probably know that those are extremely difficult to debug and often are impossible to debug if you actually are working with the data set with a heap size larger than maybe 16, 20 gigabytes. In my experience, small cluster was probably 50 gigabyte of RAM each across ten nodes, so maybe 500 gigabyte of RAM total. So ten nodes, 50 gig each, it can produce easily a parse of maybe 5 minutes, easily, where the cluster looks frozen, but in reality, JVM’s actually are performing garbage collection. So of course nobody can go in production with those kinds of freezes in your cluster.
And to mitigate this problem, there’s a concept called off-heap memory where we store data not on-heap, but off-heap. We essentially put it outside of JVM heap, and we manage it ourselves. Because JVM doesn’t know about it, it cannot affect garbage collection, so there are no GC pauses. And both projects actually provide this functionality in Ignite. It’s called off-heap memory.
In Hazelcast, they call it high-density memory and probably the main difference here would be that in Hazelcast, it’s available in enterprise edition and so in Ignite, as part of Apache Ignite project, which is part of Apache’s open source. So feel free to check it out.
One important feature that you may wanna pay attention to is ability to store SQL indexes on-heap. Indexes often actually occupy about 20, 30 percent of the data. So if you’re working with 100 percent, 100 gigabyte of data, you can easily have about 20, 30 gigabyte of indexes stored in JVM and if you again, load that data on-heap, you are back to square one. You are again running into large garbage collection pauses, so it’s very important that the product supports off-heap indexes and ability to do joins, ability to do full SQL queries using those indexes is also very important. So do pay attention to this functionality.
So another interesting feature and probably one of my favorite ones is distributed data queries. I mean I’ve been doing this for quite some time, and I’ve looked at many products providing different query languages and different query capabilities. I would say that as far as data grids go and as far as distributed caches go, Ignite probably has the richest query support. It’s one where Ignite supports ANSI 99 compliance SQL, so pretty much everything, the whole SQL dialect all can be used within Apache Ignite.
So you can use group buys, order buys. You can use all sorts of averages having clauses, unions with inner selects, even custom functions. If you’d like to create a function that is not available within Apache Ignite by default, you can actually write it yourself and then call it from SQL. So it has very rich SQL support, but my favorite feature probably within the support is ability to perform distributed joins, distributed SQL joins across multiple caches, so you can actually – you treat caches as database tables and perform SQL joins based on some foreign key relation that you define. So all the indexes are supported. All the lookups and joins are very fast because they happen in memory and they happen from indexes lookups.
So ability to perform distributed joins is probably almost unique to Apache Ignite. From an open source standpoint, I’m not aware of many data grid projects that have support for this. And again, I want to mention that both indexes and SQL work with on-heap data and off-heap data, so essentially, if you’re using off-heap memory, then indexes will also be off-heap.
Here’s an example of how an Ignite query might look like. It issues a standard SQL syntax. It’s probably one of the simpler queries we can run on top of Apache Ignite, but yet it actually shows that you can do averages, maxes, means, some aggregations. You can do a join across three different tables and in case of Ignite, those are three different caches. We’re using standard for a close and we’re matching based on some foreign keys and we’re doing a group by and order by clauses here. So a fairly rich SQL syntax. I think probably one of the most popular features within Apache Ignite is ability to perform distributed SQL queries. On the Hazelcast site, I should mention that it also supports SQL, but it only supports the real keywords. I think precisely and/in/like/between, maybe one or two more, but the main, I think, deficiency of SQL and Hazelcast from my view would be lack of distributed joins.
So when I go to perform a distributed join in memory – so essentially, if you were to execute a query like we have here, the select query that you see on the screen, in Hazelcast you would probably have to do it manually. You would have to load some preselects and keys from person, then send them a pre-select and do further subselect on department and then on organization. So it’s a three-step distributed join that you would have to implement manually, which is probably not gonna perform very well, and it’s very inconvenient.
And last I want to mention is difference in transactions. So when working with transactions, there are different consistency modes that you can have. Some of them are pretty standard and you’ve probably heard about them as pessimistic mode where data is lock-on-access. In Ignite, it’s actually called pessimistic mode. I think in Hazelcast, you have to explicitly say, “Get code. Get for update in order to make sure that data gets locked-on-access, but in a nutshell, both of them provide about the same semantic. It’s a pessimistic, transactional concurrency, and in optimistic mode, the same thing happens but now locks are required during the commit phase.
However, both of this approach actually suffers from one deficiency. It’s that in either case, it is possible to get into a deadlocked scenario, and the way you get into deadlocks is when – especially in a distributed system where it – I mean it doesn’t matter. If you do not acquire locks in the same order, you may get into a scenario where two threads will be waiting on a resources blocked by each other, and then you end up in a deadlock situation. So you always have to acquire locks in the same order.
However, in large projects, that is often impossible. I think from my experience, I’ve seen at least several projects where even asking to acquire locks in the same order would be unfeasible. Just imagine a large bank, probably 20 different teams, all of them producing some components and API’s that have to work together, and all of them actually have to span a single transaction. So all of them have to operate within a single transaction. Having asked those groups of developers to now worry about order in which they acquire and access data would probably be a futile effort because data is accessed – there’s millions of lines of code in each component and data is accessed everywhere. It’s almost impossible to fix.
So to me, we in Ignite came up with a cool concept. We’ll call it deadlock-free mode. Essentially it’s optimistic, serializable mode in which we do not acquire any locks at all. Because we do not acquire locks, this mode turned out to be actually about 30 percent faster than other transactional modes, than pessimistic and optimistic. And essentially what we do is we try to order transactions ourselves.
And my directives at the time will succeed and in which case all these deadlock-free transactions will complete successfully, but sometimes we are not going to be able to resolve the conflict, and in that case, one of the transactions will fail. Essentially, it happens whenever a key is updated concurrently from multiple processes and we cannot order these updates – we cannot put these updates one after another, in proper, guaranteed order. Whenever that happens, we’ll have to fail one of them, and essentially the user will get an optimistic exception and will have to retry it. Essentially most of the updates happen with very little if any contention, but concurrent updates to the same keys have to be supported and again, Ignite will make the best effort to have them complete successfully. And whenever it can’t, it will throw an optimistic exception and the user can retry.
So on Hazelcast side, there is no optimistic, serializable mode, so you would also always have to worry about optimistic and pessimistic transactions, and acquiring locks in the same order. So just one of the differences on the transactional side that you have to look out for. And briefly on the roadmap, this is not an extended roadmap. It probably goes as far as this year. Probably anything beyond this year is prone to changes, but here’s our immediate focus.
Apache Ignite 1.6 is around the corner. Some of the cool features there is improved memory footprint. It’s gonna take up about 30 percent less memory than Ignite 1.5 and major performance improvements follow from there. We also have added an ODBC driver. So Ignite historically had a JDBC driver. Because it supports ANSI 99 SQL, you can talk to Ignite using standard JDBC connectivity and then connect to Ignite just like you would connect to any relational database and start issuing queries. However, it worked only for environments that support JDBC and there are plenty of environments that do not, especially the environments that are not based on JVM.
And to support Ignite connectivity from those environments, we added ODBC connectivity, so essentially the same kind of functionality that is available through SQL API is now available through ODBC driver. We’ve already tested it with tools like Tableau and Microstrategy so you can actually use those virtualization tools together with Apache Ignite to visualize your data and to introspect your data and analyze your data, if you will.
You can also now use Ignite from different non-JVM based languages that support ODBC connectivity, and another cool feature that we’ve added to Ignite would be deadlock detection. So if you are not using our deadlock-free transactions, then as I mentioned, your code could experience deadlocks simply because you may not be acquiring locks in the same order.
This problem is not, by the way, specific to data grids. It’s specific to probably any kind of databases or data storages. You can get deadlocks and then in Oracle database, in MySQL database, and it’s important that the system is able to detect those deadlocks and give you a way to fix them. So that’s exactly what we’ve added. We’ve added a deadlock detection mechanism. Essentially now if you see a deadlock, we can do that, detect it, and we’ll provide you – the most useful thing about deadlock is actually knowing which keys are in deadlock so you can actually go back and fix the code.
The bug in this problem, without knowing that information, probably takes a long time. You have to do hip down strip downs. You have to do a lot of analysis. So, to save you all this time, we provide a list of keys that are in deadlock state directly from an exception that is thrown in case of a deadlock. So you can take that list and actually look at your code and fix the incorrect access to make sure that the locks are acquired in the same order.
And if we look past 1.6 probably 1.7 or 2.0 even, which should happen this year, then probably the main focus within Apache Ignite remains to be SQL functionality. As I mention right now, Ignite supports SQL querying very well. It’s ANSI-99 compliant, but essentially all the SQL that starts with the word select or explain is supported by Apache Ignite. So explain for execution planning and select for sub-querying of the data. So we’re looking to add full SQL support.
We’re looking to add DDL, which will allow you to create tables, create indexes, so interact with data just like you would interact with a standard SQL database. And we also are adding DML, data modification capabilities. So essentially you’ll be able to do insurance updates and deletes using SQL as well. So once we’ve done that, after we actually succeed at this, you will be able to interact with Ignite fully using SQL without writing a single line of code. So a distributed data grid available through key/value API or a full SQL API, and you do not have to ever use, for example, if you wanna use only SQL, you have that option.
You can connect through ODBC. You can plug and play up into any RDBMI system and if you already have some JDBC code or ODBC code, it will work on top of Ignite. So it’s a pretty cool feature and I expect probably late summer, early fall we’ll have this capability within Ignite, so stay tuned.
So now actually let’s take a look at some benchmarking that we’ve done and benchmarking of Ignite versus Hazelcast. Essentially I mean when doing benchmarks, there are many commercial vendors. For example, there’s Oracle Coherence There’s GigaSpaces, but we cannot benchmark against commercial vendors, so actually their license would not allow us to benchmark Ignite against commercial vendors.
So the only way for us to compare performance is to benchmark against other open source projects like Hazelcast. So before we dive into the benchmarks, let’s take a look at what we have to watch out for, some gotcha’s in organic benchmarks. And probably number one, once you have tuned the configuration, and one of the challenges is actually making sure that you are comparing apples to apples. So the configuration has to be properly tuned and properly configured on both sides, in this case on Ignite and Hazelcast, and we should then match. But one of the things we immediately – people around benchmarks often forget to do is to warm up the JVM. JVM has a hotspot complier.
Every time you run your code, it actually optimizes it better and better and pre-compiles certain pieces for faster execution. So if you’re running benchmarks, it’s very important to give it probably about 30 seconds to warm up before you start measuring. So run the benchmark, but don’t start measuring for about 20, 30 seconds, and essentially you will see if you’re printing out the throughput and latency numbers, you’ll see that they’ll start – begin to stabilize, and after that you can start measuring them. Because if you start measuring from the get-go, you are not measuring the code that you will be running in production. You’re measuring not optimized code that JVM hasn’t had a chance to pre-compile yet.
Also, do not benchmark from one thread. Products like Ignite and Hazelcast are both built for multi-thread processing, and you will not be able to load either one of those using one thread to do cacheputs, for example, where sometimes you have questions and doing cacheputs from within a for loop, and I have a couple of Ignite servers deployed, and CPU load is hardly visible in any of those servers. And that is correct because those servers are not even nearly feeling the load because you’re only accessing them from one thread. So do run multi-thread benchmarks.
The examples, I’ll be showing are going to be based on 64 thread benchmarks, so we are bombarding the servers from 64 threads. Always measure both latent systems through both. Always look at network utilization and probably the most important part is do continuous monitoring of the state of your benchmark. We’ll often see people run benchmark, probably taking the time from start to end, counting the number of operations, then divide one by another and here you go. There’s your throughput, but that’s not indicative of what happened inside of a benchmark.
For example, if you do not know how your benchmark behave throughout the run, you may have missed –maybe the throughput dropped ten times to zero and you haven’t seen it, haven’t noticed it because you only looked at the beginning and at the end. So do continuous monitoring, and we actually use a tool called Yardstick, which was originally created by GridGain and actually provides this capability. It generates graphs, so you can actually see how benchmarks behave throughout the execution. It also will let you know if any drops in throughput or latency will be visible. And it actually has a pretty cool integration with AWS and Docker, and you just deploy your Docker container onto prepared data images.
You provide it to GitHub repository with your benchmarks so you can change your benchmark code on the fly or rerun it and it will rebuild your code essentially around the benchmark. So very cool feature. It allows you to easy run the benchmark and distribute it in environment and easily change them. And all the benchmarks I’ll be showing today are actually available on ER6 repository in GitHub, on GridGain open source repository in GitHub as well. Feel free to download and chat.
So what we have been benchmarking is we’ve been running benchmark on the AWS, as I mentioned, and we have one client and four servers and we – on the client side, we bombard servers from 64 different threads to make sure that the load is actually adequate. We have the primary data set and configured with one backup copy. And we configured the backups to be synchronously updated. So backups are updated. So every time a primary is updated, a backup is updated as well before the operation completes. So again, eight CPU’s with 15 gigabytes of RAM on each server and the benchmark results are available on GridGain website, so you can always come and take a look or download and run it yourself. And again, it’s Ignite 1.5 versus Hazelcast 1.6. Both of those, to my knowledge, are the latest versions of either product.
So we start with one of the first benchmarks we run. It’s just basic put and get on atomic caches. The code for the benchmark is provided here. It’s essentially a basic get from a cache and then a put into the cache, a very basic operation. What I’ve noticed is it’s best to keep the benchmark logic very simple. This would allow you to benchmark the actual product instead of benchmarking the logic that you put in around the benchmark. So the logic is kept simple on purpose and this benchmark we gain one client for service. Both products are pretty fast.
I mean 106,000 operations per second in Ignite, 92,000 in Hazelcast. Ignite is 15 percent faster, but I would say both are pretty fast. It provides significant throughput for this type of operations, and low latencies. Now, if you come to transactional mode, in transactional mode, Ignite actually starts getting better. I think the difference starts getting bigger. Again, if you look at this example, essentially the code looks pretty similar to the previous one.
This is an Ignite code and Hazelcast looks somewhat different, but the effect is the same, and essentially you start your transaction. Note that we put it in a tribe lock so it will be automatically rolled back if an exception happens. And there, some of the quota’s the same. What do we get from cache? And then we do a put from cache and then we commit the transaction.
Get and put together will happen as one transactional operation. So essentially there it would impossible to have – if we got the value from cache within the transaction, then we’re guaranteed that this value will remain the same throughout the course of the transaction. So the whole code of the transaction is the timing and in – for this transaction, again, same test environment. We have 20,000 operations per second on Ignite, 16,000 on Hazelcast. Note that now we actually have to group operations together, so it will be slower than atomic mode, but still again, pretty fast. Ignite here is about 30 percent faster. Some more even more apparent difference comes with a deadlock-free mode.
Again, since deadlock-free mode is a fully consistent mode, but now it’s fast because it does not acquire locks, so if you compare it to the closest transactional mode in Hazelcast, which is the pessimistic mode, then you get about 72 percent improvement in performance in Ignite. Now, at 72 percent I would say the difference is pretty significant. It almost two times and here, what does two times difference mean? It means that you can now deploy Ignite on two times less the amount of servers and you’ll get about the same performance.
So with a deadlock-free operational mode, the difference becomes pretty significant. So again, it’s my favorite transactional mode, so do check it out. Again, decode and API are always the same, and also on the right side, note that even the graph in the deadlock mode looks, from a latency and operations standpoint, a little bit better than in Hazelcast pessimistic mode.
And last benchmark that I’m going to share with you is querying. Again, we compare Ignite SQL to Hazelcast SQL, and only whatever operations are supported. So we cannot compare distributed joints in Ignite because they’re simply not supported in Hazelcast. But for the operations that are supported in SQL, there are two types of benchmarks that we run on SQL for SQL. One is just querying, so there are no updates and we’re only querying the data. And Ignite is actually about 30 percent faster. But just by looking at the numbers here in Ignite, you can perform 73 SQL queries per second. Those are distributed SQL queries on a four node cluster. Just issue a number is actually pretty amazing.
In analytical use cases, you rarely have to care about this type of throughput. All you care is that probably less than a second, so a human eye does not notice, but on a transactional site, when you’re updating data, maybe performance transactions separating on it, and querying that at the same time, this kind of throughput becomes very important. So another benchmark we run for SQL is SQL in parallel with updates. And in this case, both products perform about the same with Ignite, being about eight percent faster, but we get about 63 (thousand) operations per second, SQL queries per second, and updates together, so it would be operations per second versus 58,000 in Hazelcast.
So that will actually conclude my presentation, so thanks for listening. I will also mention that if you are in San Francisco on May 23rd and 24th, you should check out our In-Memory Computing Summit, which will be held at Grand Hyatt San Francisco hotel. I probably should let Lisa talk about it, because she’s been the one that is promoting this one, so Lisa?
Alisa Baum:
So we have a special discount code for anyone on this webinar and that’s GGWEB20, and what that will give you is an additional 20 percent off the early bird rates, which end on the 22nd at 11:30 PM, Pacific Time. So this special code combined with the early bird rates will give you really good savings. I will include this in the follow-up e-mail that will go out to everybody. I’m gonna include the recording, the links to the presentation, as well as information about this discount, and I just wanted to say this is on new registrations only. Okay, with that said, please go ahead and put your questions in the questions window of the GoToWebinar control panel, and I’ll start reading them to Dmitriy. Okay, Dmitriy, your first question is in the benchmarking SQL with updates, is it good to use SD average as a measure of dispersion?
Dmitriy Setrakyan:
Standard deviation?
Alisa Baum:
Yes.
Dmitriy Setrakyan:
So the question is, if it’s good to your standard deviation for SQL, I think – I mean it depends on your use case. I mean definitely the less jitter you have, the better, so it’s always important to measure standard deviation on any of the benchmarks, actually. We do measure it on any of the benchmarks, and if we started getting a lot of, like, more than jitter than we would like, we usually start fixing it right away. So the answer would be yes, but probably not particularly for SQL, but also for all other benchmarks as well.
Alisa Baum:
Okay. How does GridGain compare to Streamscape AKKA Actor Model?
Dmitriy Setrakyan:
Well, you’ve got me there [laughter]. Actor Model probably – I mean the Actor Model that I’m aware of is an AKKA. Maybe that was the question. Essentially for computational side, actors allow you to distribute logic and do kind of event-based reactive programming, absolutely the same as possible with GridGain. As a matter of fact, we actually have an actor class in GridGain, or I’m sorry, in Ignite, that you can actually use to mimic this actor-like behavior, but you’ll also have simple distributed lambda execution, which – or closure execution, or simple for/join execution paradigms that are available in Ignite that are also pretty straightforward to use.
Alisa Baum:
Okay. The next question, can you comment on the somewhat controversial remarks from Hazelcast’s CEO about the Ignite benchmarks?
Dmitriy Setrakyan:
Yeah, it’s actually a while back. We were very surprised by Mr. Luck, CEO of Hazelcast, actually writing a blog accusing Apache Ignite community of deliberately faking the benchmarker results. It was a very surprising blog, I mean especially considering that we’re an open source project, and essentially all the code is open. All the images are in open. All the Amazon images, Docker images, everything is open, so anybody could just take benchmarks and reproduce them.
So we were very surprised when we saw that blog. After digging into it, we actually saw that the benchmark that was run by Hazelcast was very different. They ran it. Ours was on Amazon. Theirs was on their own servers. Ours was with a configuration I suggested.
They had a totally different configuration for their benchmark, and of course they got totally different results, which we cannot reproduce, confirm, or deny. So we have no way of reproducing their results. And because of, I believe that, this blog came out, and on my site, I responded from GridGain of the next day where we again. Of course, when we saw that blog, we ran all of the benchmarks again, and we reconfirmed all the results, and actually our results were all correct. So we have reposted the blog saying that essentially we stand by our benchmarks, stand by our results, so we have no idea what Mr. Luck was using or thinking there.
Alisa Baum:
Okay. The next question, what languages are supported, Scala, Python, Java.
Dmitriy Setrakyan:
So what languages? That’s a very good question, so how can you talk to Ignite? So essentially some of the native platforms that are supported are .NET and C++ have native feel like home support within Ignite. So the way we support .NET and C++ is not actually by providing a thin client. If you’re running, for example, .NET, you do not have to write a single line of Java code when deploying your data grid, when running computations.
You can actually take a .NET closure, .NET lamba, and run it on top of Apache Ignite. You can use .NET, C# based persistence logic and for retro and read-through functionality to your database in Apache Ignite. So it’s a very deep and very native, I would say, integration between Ignite and Java, .NET and C++. From a client standpoint, you can also use named cache API’s. You can use simple ref-based API.
There is also work in the community, ongoing work in the community on supporting Redis API to talk to Ignite. We’re adding now [unintellibible] support. It’s almost done. It may actually make it onto 1.6. We will see, and also I should say that because of ODBC connectivity and before, because of our SQL support, any language like Python, for example, that supports ODBC, which it does, so you can actually use ODBC driver from Python to connect to Ignite. So using our ODBC driver, we can actually connect to any language out there.
Alisa Baum:
All right. Next question, how would Spark, Ignite, SQL, work when one of the nodes in the distributed node set goes down? Does the query execution hold off until the failover process is completed?
Dmitriy Setrakyan:
That’s a very good question. So again, there are several ways you can provide redundancy within Ignite. So if a node falls over, if a node fails, do you lose the data, right? So if you do not wanna lose the data, then you have to have some redundancy. You have to configure either backups or ability to restore data from some other storage. In either case, if you, for example, provide backups, the SQL will complete in guaranteed fashion. The result will be consistent.
So as long as the data is in memory, either in primary form or in a backup form, we guarantee that the result will be fully consistent. So in the case if you have a backup and a node crashes, that portion of the data will be the execution query, for that portion of the data will be failed over evenly across other nodes, and the query will still complete successfully.
Alisa Baum:
Okay. What are the plans to add for grid segmentation processor to detect split-brain syndrome?
Dmitriy Setrakyan:
That’s a good question. We’ve had many requests in the community, and about network segmentation, and it actually has some interfaces for it, but the implementation of those interfaces is available only when you use GridGain edition, which is based on top of Apache Ignite, but it adds certain features. We’ve been in talks with GridGain, so I personally don’t mind donating this talks with GridGain so I personally don’t mind donating this functionality, especially given everything else GridGain is working on, so I think it’s very possible that at some point in this or next quarter, some of the closed source enterprise features in GridGain will make it into Apache Ignite.
Alisa Baum:
Okay. Does Ignite SQL work the same way with Spark?
Dmitriy Setrakyan:
So very good question. So how do we help Spark? Why do Spark users come to Ignite to run SQL? So there are two things Ignite provides for Spark. One would be ability to distribute data. So Spark does not have any data sharing capabilities, so what Spark provides you is data processing capabilities.
So ability to load data locally in memory and process it, crunch some computations from it, and spit out a result. However, what if you wanna share some intermediate state, or do you wanna share the result without writing it to disk? So we have many Spark users come to Ignite, and using Ignite caches, or in Ignite we’ll also provide a concept called Ignite RDD, which is to integrate with Spark RDD and essentially it’s a wrapper around Ignite cache. And if you run SQL on top of that RDD, it will use Ignite indexes and it will be a lot faster than Spark SQL.
For example, a simple join that I’ve shown on my example, where I had to join across three tables, in Ignite we’ll use indexes and will return probably in under – in a couple of milliseconds. In Spark, they’ll probably take about 15, 20 seconds because you have to use full scan, and it’ll have to shuffle data around. So the presence of indexes actually makes a huge difference. It’s a little longer to load the data in memory because you have to index it, but once you’ve loaded it, you can query it at will very fast.
Alisa Baum:
Okay, and we have time for one more question and that is can you run the same benchmark on GCE?
Dmitriy Setrakyan:
So yeah, I believe. So GCE is Google Compute Engine. The question is, can we run it on Google Compute Engine? So Yardstick natively integrates with Amazon in a very nice way that it actually spits out the results in the stream bucket. So I don’t know if Yarstick could provide the same type of integration with GCE. I don’t think it does, but the code itself, yeah, you can run it anywhere you like. You probably would have to do it a little more manually than you would have to do on Amazon because of your Yardstick native integration.
Alisa Baum:
Okay. Well, that brings us almost to the top of the hour. If I did not get to your question, I will make sure to get those over to Dmitriy and just as a reminder, I’ll make sure that everyone gets a copy of the link to the recording as well as the presentation slides within 48 hours, and with that said, I’d like to thank Dmitriy for today’s presentation and our audience for participating, and we hope to see you again on a future webinar. Thank you so much, Dmitriy.
Dmitriy Setrakyan: All right. Thank you. Thanks, everyone. Bye-bye.