Apache Ignite™ Coding Examples – Part 2
This is the second session of a two-part series in which Dmitriy Setrakyan, PMC Chairman of Apache Ignite and co-founder and EVP of Engineering at GridGain, demonstrates several coding examples that demonstrate the ease with which Apache Ignite can be implemented in typical environments.
In this 50-minute webinar specifically designed for software developers and software architects, Dmitry provides a quick overview of the Apache Ignite™ In-Memory Data Fabric before demonstrating a number of coding examples that expand on coding examples covered in Part 1 of the series. He will finish by answering any questions you may have.
Don't miss out on this opportunity to gain a solid foundation in Apache Ignite™.
Founder & CPO, GridGain Systems
Dane Christensen:
Good morning or afternoon, depending on where you’re tuning in from. I am Dane Christensen, the digital marketing manager at GridGain Systems. I want to thank you all for taking the time to join us today for a Apache Ignite Coding Examples Webinar Part 2 with Dimitry Setrakyan, who is one of the founders at GridGain Systems, as well as a member of the Podling Project Management Committee for the Apache Ignite Incubating Project.
Dimitry Setrakyan:
Yes. Thank you, Dane. My name is Dimitry Setrakyan. I am one of the founders of GridGain Systems. I am also on the PMC committee for Apache Ignite Project. I am one of the of the committers there as well. So thank you all for coming and thank you all for listening. This is a part two webinar of the coding examples for Apache Ignite. Last time we have been able to show some cool examples using data grids and compute grid functionality. And I wanted to get to streaming, but I don’t think I had a chance to get to streaming. So guess what? Today we actually are going to get to streaming because we are going to start from streaming examples. And we’re going to look at example with pretty cool word count example. Because everybody else does word count, so why should we be any different? And we are going to show sliding windows for this word count and how to use SQL queries while the data is being streamed into the system.
And the second example I am going to show – hopefully I’m going to have time – is our service grids example, which will be shown in the context of the streaming example, but then we will deploy our streamer as a service, as a cluster singleton service and show the advantages of doing that as well.
So again just briefly, I usually start with this jigsaw puzzle, which shows most of the components provided by Ignite. The main message here is that Ignite is not just a distributed key-value store in memory. So it’s not a distributed cache. Even though a data grid is probably one of our biggest components and it provides transactions from cache queries et cetera, et cetera, it’s just one of the components we provide. And there are more. There is also compute grid, which is all about executing map reduce and port joins type of queries on the cluster in load balance and fault tolerance. There are service grids which we’ll be showing today, there’s streaming, there’s file system messaging, data structures, et cetera, et cetera. So Ignite is actually a collection of independent components. But when you use them together, the advantage you get is that they are very well-integrated with each other.
For example, if you send a computation to cluster and your computation needs to work on the data, that computation will go to exactly to the nodes where the data is at.
The same goes for streaming. The streaming is very integrated to a database and with compute grid, et cetera, et cetera. So all of these components are very well-integrated with each other. You never have to use all of them. You usually start off with one or two and then add more as you get into more advanced use cases.
So let’s get started with our streaming example. So streaming is all about actually ingesting large amounts of data into Ignite caches, into Ignite distributed caches. And don’t think about streaming that you always have to have a continued stream of the data. Streaming also works very well with a finite data set.
For example, if you have lots of data sitting in your Oracle database or maybe in Amazon S3 box and you just want to populate Ignite cache with that data, you would use a streaming approach. You would use our ignite data streamer API and you would start by pointing data that way.
However, the cool thing about streaming data is that it actually it also works with continuous streams of data. And so the data that you’re streaming into Ignite will never end; it just keeps on coming. And as it comes, Ignite has to process that data in parallel and provide capabilities to query that data and provide content authentication perceive that data. So all of that is available within Ignite.
So as the data is coming in, you have to be very sensitive to the memory consumption because the data never ends. You do not want to run out of memory. So for that you probably would set up a sliding window into which you would query. And sliding window is a window that goes into some time deck and allows you to query data in that timeframe.
For example, you can create a sliding window over one hour, over the last hour, over the last day. Maybe over the last week. Maybe you want sliding window that goes over last one million cache keys that you have stored from the streaming. So all of that is possible within Ignite Streamer. And we actually are going to be showing examples of streaming words into a sliding window and querying those words in parallel.
So one of the differences with the Spark streaming is that Spark is what they call – if I can pronounce the word correctly – is discreetized streaming, which is essentially streaming data into the Spark one RDD at a time. So it just batches them as RDDs. And you can process only one RDD at a time. In Ignite you actually stream data into the Ignite caches as it comes without any grouping or batching of that data. So as the data is coming in and it goes into the Ignite cache. So it’s actually the real streaming you get with Ignite. And because of that you actually can process thousands upon thousands messages per second in Ignite. And actually we will see it here because we are going to print out how many words per second we can process.
So let’s go ahead and get started with example. Before I do that, I want to show where you can download the latest version of Ignite. So if you go to Ignite website and you go to the download page, you will notice several download sections. One is third-party binaries, then official Apache Ignite releases as well. So the differences with the third-party binaries is that it’s generally a GridGain community edition is that it’s probably about two, three weeks ahead of official Apache Ignite release. And that’s exactly the one we are going to be using here. We just released version 1.0.4, so I’m going to be using for this example demonstration GridGain Community Fabric 1.0.4, which in two, three weeks will become Apache Ignite 1.1.0 version. So let’s go ahead and get started.
Apache Ignite comes with the setup examples, which I already have imported here. And we are going to be looking into the example package. It’s a palm-based package, so you already support the palm file. And we are going to be looking into streaming word count package under our examples. There are three classes here. Let me close everything out.
There are three classes here. One is Cache Config, another is query words, and another is stream words. So they are actually doing exactly what they are called. The stream words class is responsible for streaming data into – streaming words into Ignite cache. And the query words will periodically query that cache as the data is coming in.
So let’s actually take a look at the code that we have here. And it’s a standard example that comes with Ignite, but it’s a new example that was added. It has quite a few changes from the previous version of this example.
So first and foremost we got an Ignite node. We then create a cache. And we’ll configure a cache with sliding window. Let’s go ahead now and look at the configuration of the cache.
Again, essentially we call this cache words. And we define sliding window by setting an expiration policy of one second into the cache. And we also create indexes. And this becomes important when we will be doing SQL queries into this cache as well. And I will talk about them as well.
So let’s go back and look here. So the caches are dynamic. What this line of code does is essentially it says if cache does not exist, create it within the cluster. If it exists, please go ahead and use the already existing cache. So we add a cache, we create a cache, and then we create a streamer for that cache. And the only parameter you need to create a streamer is the cache name. And that is the name will be cache we just created. And after that, this code is all about reading wonderful book that I have stored here called Alice in Wonderland. I’m not going to go through the text of that book, but if anybody is interested I can send you a copy afterwards. So we are going to be streaming the text from the book into the cache. So we are going to be reading this book line by line from a file system. And for every line we’re going to split it into multiple words using a space separator. And if essentially the word is not empty so it didn’t have two spaces in a row, then we add it to Ignite. So this is the only line of code that is required to add data into Ignite stream.
And the way we add it – we should actually focus on this line a little bit. The way we are adding them is we create a unique key for each word and we put that word into cache for that key. Ignite caches work with keys and values. Since we only have a word as a value, how do we create a key for that? And how do we make sure that all identical words end up on the same node for faster processing? Because we want to collocate identical words together so we can count them faster.
So for that Ignite has a cool abstraction called [unintelligible] that will essentially we will generate a unique key for you, globally unique key. But you’re passing the word into it, so it will collocate that key with the parameter you’re passing to it. In our case we pass the word as a parameter. So it will collocate the key with the words we’re storing in cache. So that’s what we do to stream the data into the cache. Let’s go ahead and get started on that.
We are going to start a couple of nodes. This is how you start an Ignite node. You can start it from command line, or if you want to start it from code, all you need to do is call Ignition.start. And you attach a configuration file if one is needed. Otherwise Ignite will start with all defaults.
So let’s go ahead and start in physical node start up. Start a first instance. And I’m going to start it up again. So we’re going to create a cluster. All right, so both nodes discovered each other. The printout is that we have two nodes here. And let’s execute our streamer – stream words process that we just talked about. And it’s going to start up. It’s going to join the cluster, and it’s going to stream words. Nothing is actually happening as far as printouts go because all we’re doing is streaming data. How do we know? We look at our CPU utilization and we see that cluster is very busy. My laptop is actually about to I think about to smoke. Smoke is about to start coming out of my little laptop which I’m using for this presentation. But the data is being streamed.
So now let’s actually query that data that we’re streaming into the Ignite cluster. For that we use standard SQL. As the data is coming in, we are only keeping one less second of that data. We want to query into that data and we want to get ten most popular words of that data. So for that we define a couple of queries. In Ignite we use standard SQL. Tabs become tables, queues become columns. So in our case, the type of the string. So we use that as a table name. It’s a box type. It doesn’t have any fields. So for types of this kind, for box types, Ignite provides its own name called underscore val.
So this is the nature of our query. We select val and its count from string. We group by val, we order by count in descending order because the bigger the count, the more we want to show it. And we’re limiting to only ten.
And the last parameter tells Ignite that all the words are collocated so Ignite cannot provide optimizations for the collocation mode.
On top of that, I’m actually going to execute another SQL query that is like some averages from the cache as well. We are going to average count, mean count, and max count. This is where the cool part comes in [unintelligible] SQL. So first we select count for each word, and then we select average, mean, and max from that nested SQL. And we are going to execute both of these queries. This is how you execute. So now we define those queries to SQL queries. This is how you would execute them. So we take our cache and we call query method, and we attach our query into it. Results are returned in paginated form. In our case, we know we’re only returning our results. So we don’t need to paginate. So we call “get all” and it will return the whole thing.
And then we just nicely format it and we print query results. So first we print averages, and then we print query results. And we execute this query every five seconds.
So I’m going to go ahead and run this. It’s taking a little bit of time. My laptop is very overloaded. I am doing a webinar, I am running a very CPU-intensive streaming example, and I am also starting the query example. So our query node has joined the cluster. And here we start seeing query results coming in. Some mean, max, averages, and the results are being printed out as they come in.
You are seeing some discrepancy between maximum value in the results and the value in the top word. It’s mostly because these two queries are executed separately, one from each other. And there is a lot happening in between these executions. So now our results are coming in as well and we periodically get ten most popular words that are being stored in Ignite.
So, pretty cool. One thing I want to mention here is that the caches are fault tolerant. The words can be backed up and you can define as many backups as you need. However, what if the node that actually does the streaming crashes? What happens then?
So we’ll go ahead and kill it. Data Streamer has been closed and the query results becomes empty. So there is nothing to query because the sliding window of one second has expired for all the data, and we have no more data to query.
So it’s not very cool when you are actually working in production and you want your streamer to also survive crashes. So how do we guarantee it within Ignite? And that’s where Ignite’s cluster services come in.
Let’s go ahead and talk about cluster services a little bit and then I’ll show how to create a cluster service as well. So Ignite services are defined in component called services or service grid. And the picture shows very well the difference between cluster singleton and node singleton clusters. Essentially service grids is all about defining different singletons on the cluster. That’s the main application of it. So cluster singleton, you have one instance on the cluster in a guaranteed fashion. And node singleton will make sure that every node has that instance. There is also a key singleton, which will guarantee that service is attached to the key. So in case of rebalancing and if a key travels to another node, the service will travel with that key to that node as well so it will always be local to that key.
So singleton is definitely the main use case, but you don’t have to deploy singletons using Ignite service grid. You could actually say I want three instances of some service deployed. And Ignite will guarantee that you will always three and only three instances of that service deployed. So that’s where guaranteed availability comes in.
So let’s take a look at the coding example where we will define our streamer, just like a cluster singleton, and make sure that it is deployed in a fault tolerant fashion.
So I am going to go ahead and switch to my intelligent screen. And we’re going to keep this guy running, let it keep returning results, not bothering us.
So let’s go ahead and start defining our service. First we start off with an API. We create a java service API. And your service does not have to have an API, but we’re going to demonstrate how to create one. So let’s call it Streamer Service. And the API will be – let’s actually calculate number of words per second we’re processing. Public int getwords per second. So that’s the API for our service. So now let’s implement that service. To implement that service, first I need to create a cluster so Streamer Service Impl.
So to implement the service in Ignite, all we need to do is specify this interface, implement service interface. And since we also defined our own API, so we also want to implement our streamer service API as well in that service.
So let’s go ahead and implement this method, the methods required by this interface. So these are the methods that are required. The first three methods are dictated by Ignite service interface, and the last one is our streamer service API. So to satisfy the last method, we need to have total count of words. Words. And private int start time. Actually, we’re going to do long. Private long start time.
So now we have defined that. Let’s actually keep going and copy some code that we defined in our streamer cluster over here so we can deploy it as a service. Let’s print out some notifications here. So first we’ll print out that if service was cancelled, we want to see it. Want to print out the name of the service. Then to initialize, we want to initialize the same cache as we used in the streamer. So I’m going to go here and create this cache as a variable. And this is how we’re going to initialize it. Exactly the same code as we used for streamer.
By the way, we don’t have an instance of Ignite in our service. So how do we get it? We inject it. So Ignite supports our injection for all the architects provided by the fabric. So we’re going to go with Ignite, and we are going to inject Ignite Instance Resource. And Ignite will automatically inject the instance of the main Ignite API into this cluster.
And since we initialized our cache, we want to print out that we initialized it. Service is initialized. And let’s actually put some execution logic into our service. And again, I am going to grab it from here and copy it into our service. Because it’s the same logic we’re deploying; we’re just defining it as a service.
So the only change I’m going to put in – a couple of changes. One, we don’t want to go past the cancellation. So if the service is cancelled, we want to stop. And I am also going to catch cache exceptions here. And if it was not interrupted, then I propagate it. But if it was interrupted, then I’m going to ignore it. That’s it. So we are ignoring interrupted exceptions. So if we kill the service, we don’t care about it. If it’s a real cache exception, we want to throw it. So those are the only two changes.
And the last method we need to implement is get words per second. And to do that we need to first start counting the words. That will be done here, So words++. So now we are counting the words. We need to also record the start time of our service. So let’s initialize the start time. And now let’s calculate words per second. So what is words per second? It’s total number of words divided by current time minus start time. And one thing, we want to do long, not int.
And since this is giving us time in milliseconds, we want to multiply by thousand to get time in seconds. That’s it. So that’s our implementation of our method. So now we’ve got our service defined. Let’s look at how we are going to deploy it. Before I actually move away from this screen, I want to go over briefly again of what has happened here.
So Ignite service API provides three methods that you are required to implement. That’s cancel, init, and execute. Cancel is where you define the cancellation logic. We didn’t have any. Init is where you do any kind of service initialization. In our case, we get a handle on the cache if it hasn’t been created. And in execute we actually perform the logic of our service. And in our logic we essentially just are using this in logic, that’s what we did for the streamer. So it keeps streaming data into Ignite, and we’re counting locally-processed words as well.
And then the last method is the one we added to the API. This is the one we will be using from public API, get words per second. And we’re just calculating the words per second.
So let’s go ahead and deploy this service. We are going to create another Java class called streamer service example. And I’m going to copy again the logic for starting Ignite so I don’t make any demo mistakes. And now we’ll start an instance of Ignite. And inside of this instance we deploy service. Before we do it, we want to specify that we don’t want this node to participate in anything. So we define it as client.
So now let’s get an instance to the service API, Ignite Services. And we want the services to work only on the server node and ignore the client node. So we pass a server cluster group into it. Ignite cluster for service. So this actually will create a cluster group only for serving nodes and we pass it in. So services will only be deployed on the server nodes.
And last method is we deploy the service on the grid. And we do services, deploy cluster singleton, we give it a name – word streamer service – and we pass in our implementation called streamer service impl. And that’s it. So we’re ready to go. Let’s go ahead and deploy the service. Note we still have these guys running. Example node startup. And the query words is actually returning empty result sets. The streaming words has stopped. So let’s go ahead and deploy the service.
This is, by the way, all you need to do to deploy service. Just call this method. Alternatively, you can also deploy from XML if you want to do some configurations.
All right. We deploy the service and exit it. And the streaming started. I believe I am not printing out anything from execute method. Yes, I am not printing out from execute method. So we’ve got services initialized. This indicates that the service has started on this node, the cluster singleton service. And this node does not have it because it’s a singleton. So only one of the nodes should have it. We also have the query results. Now we’re getting some query results again. And we’re getting the most popular word again and the query continues to work just like it was before.
So now the difference here now that our service is fully available. So our streamer can survive crashes. So let’s go ahead and see what happens. So streamer is deployed on this node. And we know this because of this printout. Let’s go ahead and kill this node. Service is cancelled. Node is stopped, exit it. Boom, service is initialized on this node automatically. And then we’re set. Queries should be continuing to work. So it might stop somewhere demo. But queries should be continuing to work. Always happens during the demo. And the result continues to come. It’s going to warm up, it’s going to take a little bit, and we’re going to start getting more words. By the way, I only have one service running.
So again, this ensures the availability of the service at all times. The service had actually zero downtime. I am sorry for the error in my query words example. But as we see, the service system is coming in and the data is still being queried. After we killed this node, the service was automatically deployed on the other node. So we get this continuous query execution without interruption.
So now I am going to actually allocate [unintelligible] earlier so I have a little more time. So I am going to go back to my presentation. And if you have any questions, I am going to open it up for questions.
Dane Christensen:
Hey, Dimitry, yes, we do have some questions here for you from the group. Just give me a second to get over to those.
Dimitry Setrakyan:
Right away I’m going to kill my streamer because my laptop is really struggling.
Dane Christensen:
Okay, so here’s a question for you, Dimitry, while you’re waiting for that. Would you provide a simple example of inspecting the data grid using the Ignite Visor command line tool? Now, that might be something you have to demonstrate.
Dimitry Setrakyan:
That would be something for the next demo.
Dane Christensen:
Okay. The next demo today?
Dimitry Setrakyan:
No. There is another coming up in a couple of weeks where we will be showing benchmarking of data grids. And on that one we will be showing – I can show that as well.
Dane Christensen:
Okay, all right. Here’s another question for you. Does the AffinityUUID ensure that all instances of the same words will end up on the same cache node?
Dimitry Setrakyan:
So the question is what does AffinityUUID represent? And the answer is yes. Essentially it is a globally unique key. But when you’re passing a word into it, you guarantee that all keys that have the same words will be cached on the same node. And that allows us to count the words a lot faster rather than distributing them across multiple nodes. And in that case the query would actually become less efficient.
Dane Christensen:
Okay. Let’s see, we have a couple other questions here. Here’s a data grid question. On a cluster with multiple nodes, how do you control which nodes a cache lands on without using specific host attributes?
Let’s say I wanted to create a cache on six of a hundred hosts, but I only know which host when my application launches. Is the cache name sufficient enough to create this kind of grouping? Data grid question, not compute grid.
So that was kind of a long one [Crosstalk]
Dimitry Setrakyan:
Yeah. So the question is how do we control topology on which we initialize the caches. And the answer is you would – cache configuration has a node filter parameter. So you can pass in a node filter or a topology filter, and it will only be deployed on the nodes that pass through that _____.
So if you want to deploy it on the node in a certain subnet or have a certain attribute or having other criteria, you could define a filter for it and specify it in cache configuration and the deployment will happen automatically.
Dane Christensen:
Okay, great. Let me know if you’re still wanting more questions. Are you still waiting on something here? [Crosstalk]
Dimitry Setrakyan:
No, there’s nothing I’m waiting for. I just realized that I actually did not show a little bit of the example. But let’s keep going through the questions, and if I have time at the end, I will continue with examples.
Dane Christensen:
Okay, I’m sorry, Dimitry, did you say you were ready for more questions?
Dimitry Setrakyan:
Yes, if you have any more. Otherwise I can continue and show one more piece that I wanted to show.
Dane Christensen:
Yeah, there are a couple more questions. And by the way, for the audience, feel free to ask your questions. Because normally we take them at the end. But it looks like we’re taking a few as we wait for some things.
Here’s a question for you. Do you – actually, this might be related to that last one. Do you tag the information coming into the network?
Dimitry Setrakyan:
That actually I’m not sure I understood the question. Do we tag the information coming in? Yeah, I’m going to pass on that one. I’m not sure I understand the question.
Dane Christensen:
Okay, no problem. There’s a couple other questions here. Here’s one. Is there a book coming out on Ignite, like Apache Ignite in Action?
Dimitry Setrakyan:
Yes, it’s currently in the works. So at some point it will come out. It just has to go through editing and a few releases before it is going to be released. But yes, it is currently in the works.
Dane Christensen:
Okay, great. Here’s another question. Run time environments. Do you support running in a platform as a service such as Cloud Foundry?
Dimitry Setrakyan:
So the question is whether or not it can run on a cloud-based environment or a PaaS-based environment. And the answer is yes. It may not be integrated automatically with all PAAS providers, but it will be always easily deployed into any cloud environment. Our clustering works across AWS, Google Compute Engine, any kind of Jcloud-compliant implementation of a cloud. So Jcloud actually works with virtually most of the clouds out there. And we integrate with Jcloud as well. So via that integration we can actually provide cloud cluster discovery most of the cloud providers out there.
And once you do that, you can actually deploy ignite as an individual service or you can actually deploy it as a service. And when you deploy it as a service, you would need some isolation. So [unintelligible] isolation is provided by caching, tenant isolation is provided by caching. And you can create different caches for different tenants. And caches can be created on the fly. So pretty much it’s a very easy and operationally you can implement during runtime. So for every new tenant of for every [unintelligible] for example, you could create a new cache. And this way you can actually implement multi-tenancy quite naturally within Ignite as well.
Dane Christensen:
Okay, great. And just let me know when you’re done with Q&A here, Dimitry. But we’ve got a number of questions coming in. So I’ll go ahead and keep asking the questions as they come in.
So this is related to the question that you answered just a little bit ago on the data grid. What if you don’t know the filters or attributes beforehand? And did you want me to repeat the previous question?
Dimitry Setrakyan:
No, it’s actually – if you do not know on which servers – I mean, if you want to specify the topology for the service you need to deploy, you have to provide us with that information. If you do not need them beforehand, but you can get them during runtime, then you can dynamically create a cache during runtime and specify the service on which the cache is deployed.
Once the cache has deployed on a specified topology with a filter – and the filter is, by the way, dynamic. Right? So as new nodes are coming in, they may also be accepted by the filter. And in that case the cache will be automatically deployed on those nodes as well. But you have to let us know what kind of nodes. So otherwise we do not know exactly where you want to deploy a cache.
Dane Christensen:
Okay, great. Here is a good one. Are you included in Apache Bigtop?
Dimitry Setrakyan:
In Apache Bigtop? Yes. So Bigtop is a very cool project which allows easy installation of different big data products, Hadoop being one of them, Apache Ignite is also one of them. So we can use Bigtop to easily install Apache Ignite. And whenever you actually do that, it integrates with Hadoop automatically, with Bigtop Hadoop installation automatically and provides immediate optimization for Hadoop using Apache Ignite in memory map/reduce and in memory file system.
As a matter of fact, another webinar will be coming up when we’ll be showing a cool demonstration. You can execute a pi calculation example provided by Hadoop using standard Hadoop, or there is a version of Hadoop accelerated with Apache Ignite. And Apache Ignite can provide more than 30X times faster execution.
Dane Christensen:
All right. More questions, Dimitry? We’ve got more questions coming in.
Dimitry Setrakyan:
Well, if you have any. I’m not waiting for anything. I can show a little bit – like an additional portion of the example. But if you have more questions, I would rather spend time answering questions. If not, I can move to the example as well.
Dane Christensen:
You bet. We’re on a roll here, so let’s go ahead and take a few more of these questions.
In the streamer example, were you querying only the past one second of data in the cache? Additionally, was the data being added all retained in the cache or just passing through?
Dimitry Setrakyan:
So let me go back to the streamer example. And here we actually use configuration of the cache and we set up the expiration policy on the cache to one second. So the data was passed in through the cache and the new data was coming in, and the older data that is more than one second old was evicted. The reason I defined the sliding window to be the small is because my laptop is small. So generally you would probably go into days, hours, or days, or maybe weeks. Or maybe you wish to limit it by site and say no more than ten million [unintelligible]. You can do that as well and implement in a FIFO fashion.
So to answer the question, the data was passing through. The new data is coming in, the older data was evicted from cache. But the query was real-time. You were querying the real-time data.
Dane Christensen:
Okay, great. And so we do have a couple more questions. I want to be sure we get all the content in that you were planning to get in, Dimitry. And oftentimes we don’t have time for all the questions. So if there is something else that you –
Dimitry Setrakyan:
Let me go ahead and finish the example then.
So one last thing I wanted to show is we forgot to demonstrate how to use the API of the service that we defined. So I am going to go ahead and print out number of words per second we were processing. And for that I am going to get instance on our service, which is the instance of Ignite services, first of all.
And I am going to get a proxy to our service, which was called word streamer service. I’m going to specify the class name that we defined for our service, streamer service. And the last parameter doesn’t matter for cluster singleton, so it will be ignored. So we are tying it to streamer service API. And let’s print out number of words per second as well. So system out print time. Number of words per second.
So here we are actually accessing our service from a client node even though service is deployed on a server node, we create a proxy for the service interface and access it from a client node.
So I am going to go ahead and start my example again. So now two nodes are started. I’m going to go ahead and deploy our service. So streamer service example was deploying our service. I’m reinitializing my example. And we are deploying service on one of the server instances. We’re going to take a look. So service has been deployed on this instance, and the streaming has started. I am going to look at the CPU load. And the CPU load is coming in. So my laptop is again fully busy doing multiple processes [unintelligible] processes that we’ve started, and we’re streaming words into the caches at a high rate from within one of the services. So within one of the cluster singleton services we just defined.
And now if we issue query words, start querying, we are going to print out number of words per second as well. So let’s go ahead and do that. So it’s about 18,000 words per second when I’m running webinar. When I wasn’t running webinar, I was getting about 35,000 words per second. So webinar is definitely adding some overhead to what we are doing here. But this is essentially the amount of – it gives you feature of the rate of data you can pump into the Ignite streamer. So we are actually pumping right now over 20,000 words per second.
And again, this should continue if we kill the service. And it didn’t work again. So something is wrong with my example. But as long as the querying part that I have somewhere is configured. And so the query – we can continue querying. So the data is continuously being streamed into the cache and we can continue with the series until we kill the service. And now we still can see that we are processing about 23,000 words per second.
All right, so this was the last part of the example that I wanted to show, access to the service API from client nodes. To summarize what we’ve done here, we defined a proxy for our service. We defined it from our client node, the same node that is issuing the queries. And we are accessing public API of that service to deploy somewhere in the cluster from a client node by proxying the calls to the server. So each invocation to get words per second actually goes through the node where the singleton service is deployed and executes this method there.
So that concludes everything I wanted to show. If you have any more questions, I can take them.
Dane Christensen:
Excellent. Dimitry, there are. Just a couple more questions. And I think we have just enough time to answer those questions. Here is one. Can we have a cache loading data from underlying DB store? Can we optimize the data stored in the cache and still have query working some ORM layer? Did that make sense?
Dimitry Setrakyan:
So I think there are several parts to this question. So the first part is can we load the cache from underlying data store. And the answer is yes, Ignite can be configured in a read-through, write-through, or write-behind scenario for any kind of underlying data store, be that a relational database, NoSQL database, or a plain file system. You can read from that store and write to that store automatically. So that was the first part of the question.
So you can pre-load the cache from that store as well. And Ignite, again, has a cache store API responsible for integration with any kind of store out there. And from that cache store API you can load the data automatically from underlying data store in bulk as well. As far as queries [unintelligible] once the data is loaded, you can actually start querying that. I’m not sure what was the last part of the question. But once the data ends up in memory, you enter that data and you can query it immediately. And you can query it very fast, as you saw. We were querying data coming in at about over 20,000 data entries per second just on my Macbook Pro laptop. So the data can be coming in at very fast rates and we can concurrently query it in continuous fashion as needed.
Dane Christensen:
Okay, great. And one more quick question here. The difference between extending from cache store adapter and cache store.
Dimitry Setrakyan:
So store cache essentially methods like Put Put O or I think Write Write O, Load, Load O, and Delete, Delete O. So cache adapter will essentially provide the whole implementation for this Delete and Put O methods by iterating through the collection and invoking the individual counterpart. So this method will iterate for all the keys and invoke boot for each of those keys. So it’s just a default implementation. But my recommendation, for performance reasons you probably want to specify an implementation of Put O or Write O method that will do a bulk write into the underlying storage versus many individual ones.
So to get quickly started you can use the store adapter, but you probably will want to override some bulk operations if you’re using them.
Dane Christensen:
Okay, great. And then the only other question was that one about Visor, which you said you would cover in the subsequent webinar. So that is going to conclude the questions and the webinar for today.