GridGain is a software middleware that enables development of high performance compute and data intensive distributed applications for real-time Big Data processing.
This book provides an in-depth knowledge on how to use GridGain software.
1. Introduction
First of all, the whole GridGain team thanks you for picking up this book and devoting your time to know more about GridGain project. Although this book is being written primarily by Nikita Ivanov and Dmitri Setrakyan - all of us here at GridGain are pitching in with reviews, tests, proof-reading, examples and creative ideas.
GridGain project has been an amazing journey for all of us to this point and as you read these lines we are continuing our work on adding new features and improving existing ones, fixing bugs (happens to the best of us) and keep thinking on how to make GridGain more enjoyable and productive to use.
GridGain open source project started in the spring of 2005 with just Nikita and Dmitriy working on it in their spare time. We’ve managed to get our first official release out in the summer of 2007. In three years since then - in the late 2010 (when this introduction is written) - our software is now starting every 10 seconds around the globe and undoubtedly is one of the most popular distributed programing frameworks in JVM ecosystem - so we must be doing something right.
This is even more exciting for us since GridGain Systems is an engineering company first and foremost. We remain small as we believe in small "surgical" teams and every member of our company still writes code (some of us less so as we need to travel, speak and write books like this one - which we enjoy greatly). We’ve visited more than 50 conferences and Java Use Groups around the globe in the last 3 years to talk about GridGain - and we are grateful to each and everyone of you who came up to our talks. That was the only "marketing" that we could afford but hour and a half was usually enough to convince folks to try GridGain out. In fact, where else you could see a full-fledged MapReduce application running on multiple nodes written from scratch in front of your eyes in less than 5 minutes?
We want to thank you again and we hope that you’ll find this book useful and effective guide for discovering GridGain.
1.1. What is this book about?
This book is about how to use GridGain software to develop innovative distributed compute and data intensive applications that run on any managed infrastructure - from a simple laptop on which this book was written to a large grids and all types of clouds. When developing with GridGain and reading this book you can use both Java, Scala or Groovy programming languages. At the time of this writing - GridGain was the only distributed computing middleware with native Scala support.
This book does not replace API manual references, and you would still need to consult them from time to time for up to date method signatures, parameter description, etc. As we are always saying in our project - documentation is the code too and we pay great deal of attention to API References (Javadoc, Scaladoc and Groovydoc) - in fact we have one of the best organized and maintained code level documentation among any relevant projects.
This book is short considering its subject - and it is on purpose. We strongly believe that one of the main reason for slow adoption of grid and cloud computing in the last decade was over-complication and unnecessary "dramatization" of entire subject. In fact, the original idea behind GridGain came from rather unproductive experience developing application with Globus toolkit - innovative piece of software for early 90s but awfully over-engineered and out of place by the turn of the century with rapid advancements in server-side JVM-based programming.
|
|
We’ve designed this book in a way that you can take a first pass over it during the weekend and have a pretty good grasp on all major APIs and concepts. As you continue to work with GridGain you’ll be coming back to specific chapters for more details and to refresh on some of the less obvious parts. This is a perfectly normal way to read this book and we highly encourage it. |
As you discover in this book the distributed computing is not necessarily complex or unwieldy as it may seem from the outside - in fact, most of its concepts are readily familiar for the most of you. Where it was usually getting complicated in the past is the tooling and framework support. That was a sore state of the affairs for a long time - most of engineering community just accepted that the distributed programming (grid and cloud computing included) just has to be "involved" because…. it is so, right?
In reality - it’s largely inaccurate. With the right tooling and framework support the distributed computing can be relatively simple and very productive. One of the main ideas in this book is to try to convince you that it is so. Every month or so we at GridGain receive email or a forum post where someone relates his or her experience of downloading GridGain during the day and by the midnight having the first MapReduce application running on Amazon EC2. By the time you finish this book it won’t seem like an exaggeration and you will have your first application running on EC2 much, much quicker.
This book covers GridGain starting with version 3.0 and provides in-depth manual for all three main technologies that are tightly integrated into GridGain cloud application platform (as of October 2010 GridGain is the only distribute middleware that provides all three technologies in the same platform - let alone for Java, Scala and Groovy languages):
-
computational grids (a.k.a. MapReduce)
-
data grid (a.k.a. distributed caching)
-
zero deployment with auto-scaling (a.k.a. elasticity on clouds)
Each of these main topics is covered in depth with plenty of examples in both Java, Scala and Groovy.
All in all, this book’s character perfectly reflects on what we think modern distributed programming should be - simple, effective and amazingly productive.
That is if you use GridGain…
1.2. About Authors…
This book is written primarily by Nikita Ivanov and Dmitriy Setrakyan. As always we encourage to contact us directly should you have any questions about this book or about GridGain. You can reach Nikita at nivanov@gridgain.com and Dmitriy at dsetrakyan@gridgain.com.
Both Nikita and Dmitriy have over 30 years of combined experience in distributed programming mostly in Java (some MPI C/C++ back in 90s - ouch) and now Scala. Nikita and Dmitriy were at the beginning of the project and to this day coordinate and lead most of the development work.
They both write code almost every day for GridGain despite heavy travel and frequent speaking engagements. And when they don’t write code - you’ll find them rooting for San Jose Sharks - the favorite hockey team of GridGain.
There are many way to stay in touch with GridGain project:
-
Quite naturally, http://www.gridgain.com is an excellent starting point for anything GridGain related.
-
Great source of information is our public forums at http://jive.gridgain.org.
-
Follow us on @twitter
-
Follow us on Facebook
-
Nikita’s blog: http://gridgaintech.wordpress.com/
-
Dmitriy’s blog: http://gridgain.blogspot.com/
2. Overview
In this chapter we’ll lay down some of the basic ideas about grid and cloud computing and how GridGain fits into it. The goal of this chapter is to make sure we are on the same page with you, the reader, as far as fundamentals of grid and cloud computing (and you won’t believe how far apart we can be on these…)
2.1. What is GridGain?
In a nutshell - GridGain is a JVM-based middleware software that enables the development of compute and data intensive High Performance Distributed Applications. Applications developed with GridGain can scale up on any infrastructure - from a single Android device to a large cloud.
GridGain provides two major areas of functionality:
-
Compute Grids
-
In-Memory Data Grids
On top of that it provides the multitude of surrounding technologies many of which are frequently used by our clients on their own.
With GridGain your applications can:
-
Work in a zero-deployment mode.
-
Scale up or down based on demand.
-
Cache distributed data in data grid.
-
Co-locate data and computations.
-
Run sql queries against cached data.
-
Store and query JSON objects.
-
Speed up task using MapReduce processing.
-
Use distributed thread pools.
-
Distribute the workload on the grid.
-
Use distributed queues and atomics.
-
Effectively exchange messages.
-
Auto-discover all grid resources.
-
Execute closures on the grid.
-
Grid-enable Java, Groovy and Scala code.
-
… and much more
2.2. Why Compute and In-Memory Data Grid?
Compute and In-Memory Data Grid act as two main axiomatic technologies for a modern distributed programming. They are fundamental because they solve two underlying problems faced by any distributed system:
-
distribution of computations
-
distribution of data
I always like to provide this analogy: every computing device - from Turing machine to the latest iPod - contains memory and a processing unit. Think about it… memory and the processing form the foundation of our computing capabilities. And so is the ability to distribute computations and data form the foundation of distributed programming.
And just like in late 1960s we’ve had first "system on the chip" where memory and processing units were finally integrated and combined on the same chip providing for cheaper, more energy efficient and much faster overall systems - GridGain has pioneered integrated middleware that combines compute and in-memory data grids in one cohesive and integrated distributed middleware software. This has resulted in similar benefits of simplified programming model, easier applicability and unified configuration and management.
Compute and in-memory data grids are the key topics in this book and we’ll talk a lot more about these two in the following chapters.
2.3. Why High Performance and Cloud Computing?
You noticed in the previous chapter that we call GridGain as a software middleware for developing High Performance Cloud Computing application. But why we focus on High Performance and what do we mean by that? And what does it have to do with Cloud Computing?
|
|
GridGain Philosophy The term High Performance Cloud Computing really came in the 2010 after almost 5 years
of GridGain development. We believe it reflects perfectly the design goals that we
originally had. Many, if not all, of GridGain’s features, designs and approaches stem from these
goals. |
Let’s talk about Cloud Computing first.
2.3.1. Cloud Computing
Despite all the buzz about cloud computing we believe strongly that from the software development perspective the cloud computing is almost synonymous with a traditional distributed programming.
In fact, that “almost” above accounts simply for a fact that unlike the traditional data centers, grids and clusters of the last decade clouds offer more fine grained resources virtualization and more management options. In a nutshell - your application development principles remain largely the same - but you have more options and more choices in how your application is deployed and how it utilizes available computing resources. Rest of the parallel distributed programming challenges of the last 25 years remain fully intact.
|
|
10 Years Ago… While ten years ago the most you should have accounted for was a new server coming up in
your local grid - today you need to be prepared for not just a new server but an extra CPUs
or extra disk storage or extra RAM appearing for your application that is snapshot and migrated
potentially half way across the globe (just look at RackSpace Flavors, for example). |
Still, the absolutely majority of the problems and challenges you are facing today while developing distributed software systems coalesce around parallelization of computing and data high availability in the distributed context - both are which are absolutely critical for any scalable distributed software system.
So, when we say Cloud Computing we mean Distributed Computing plus a few new important details.
|
|
When We Say "Cloud Computing". Cloud Computing = Distributed Computing + Data Center Virtualization |
2.3.2. High Performance (HP)
High Performance aspect is equally interesting. When we present about GridGain on the conferences we inevitably get asked about this… why High Performance and Cloud in the same sentence?
The answer is very simple: not every cloud application needs to be high performance. In fact, most of today’s cloud applications (i.e. the application that are deployed on the clouds) are not high performance.
|
|
Cloud Applications Most of today cloud applications (i.e. application deployed in the clouds) are not high performance. |
We use the term High Performance to specifically categorize applications that use distribution as means for processing parallelization, i.e. to achieve the scalability and/or performance that is theoretically unattainable on a single processing unit.
On the other hand, distributed applications that are not High Performance use cloud deployment as more convenient or economical deployment option without much need for improved scalability or performance.
|
|
High Performance vs. Not High Performance
The distinction is very important:
Naturally, some HP applications use cloud deployment because of convenience and economy as well. |
So, tieing it all together GridGain is a High Performance Cloud Application platform because:
2.3.3. Real-Time Cloud Applications
For another take on High Performance Cloud Application look at this blog entry I wrote in the middle of 2011:
2.4. Grid and Cloud Computing
To start off this sub-chapter I’ll tell you one story that happened to me few years ago.
I was presenting about GridGain at Java User Group at Dayton, Ohio. For those of you who don’t know - Dayton, OH is essentially a "sleeping quarters" for Wright-Patterson Air Force Base, one of the largest military research facilities in the world. It employees almost 30,000 people and headquarters massive Air Force Institute of Technology and Air Force Research Laboratory just to boot and it is known to have one of the largest super-computing centers in the world too. So, I’ve had few people from the base on my presentation…
|
|
Wright-Patterson Air Force Base is rumored to be the center of UFO research or so many people believe after Nevada’s Area 51 was closed few decades ago… |
After the presentation I’ve chatted with some folks and conversation drifted into general topic of grid computing and what we all understand by that (and by relatively new back then concept of cloud computing). It was a real surprise to me (to say the least) that I got three very different answers from three guys working on the base - essentially working in one big company.
One answer was that grid is really nothing new alluding to parallel Fortran of almost 50 years old, another one was more in line with common understanding of grid computing being just a new re-tooling of traditional parallel processing, and yet another answer was that the whole thing is just a hype and multi-core CPUs will displace it all together in 5-10 years max.
I remember on my flight back I was a bit perplexed by how far apart those guys were while working side by side in albeit large organization and not just on technicality - but on fundamental view of distributed programming (grid or cloud - doesn’t matter). Starting with my next presentation onward I’ve made a rule to always state what I believe about grids and clouds to at least make sure we have common frame of reference. Whether or not you agree with - is another question…
So, here’s my take. I don’t particularly like terms grid and cloud computing. There’s nothing that resembles a "grid" in grid computing and obviously there’s nothing that is performed in the "cloud" when it comes to cloud computing. Both marketing terms "grid computing" and "cloud computing" represent slight variations of traditional distributed programming.
|
|
Put in more canonical form, if you have more than one computing resource working on the same
problem in parallel - you have a grid and you are engaged in grid computing. If these computing
resources (all or in part) are virtualized and available to you on demand - they represent a
cloud and you are doing cloud computing. In the nutshell - that’s it. |
As you can see the difference between grid computing and cloud computing is only in representation of computing resources which in most cases should be irrelevant. The lines between grids and clouds are getting blurrier by the day and we use both terms interchangeably throughout this book (as long as context is clear).
It is, however, important to know the difference and new challenges that cloud infrastructure brings to the table for you as a software developer.
2.5. IaaS, PaaS, and SaaS
Only at the pick of the hype around the cloud computing can you get a chapter named like that…
Nowadays these terms are thrown often without much regard or understanding and while IaaS and SaaS are somewhat well defined, the PaaS is something that poorly defined if at all. Let us try to define these terms and see where GridGain fits in the picture.
|
|
Understanding this nomenclature is not essential for everyday usage of GridGain and most of the concepts in GridGain stay clear from these high level marketing terms. Yet - a cursory look is worth while and it will help you navigate the plethora of marketing literature that surrounds the cloud computing today. |
Picture below provide basic overview of how GridGain related to IaaS and PaaS:
2.5.1. IaaS - Infrastructure As a Service
IaaS stands for "Infrastructure As a Service". IaaS is often (wrongly) synonymous with cloud computing. It essentially means providing virtualized computing resources as a services. Think of Amazon EC2, for example.
|
|
And who would ever thought that a nascent online book seller and one of the few dotcom survivors would spearhead the revolution that is so much bigger that just an online retailing - a revolution that is radically changing the way we think about information systems! |
Amazon had its own data center and sometime ago decided to earn extra money by renting out their often unused computing capacity. So, they put a hardware virtualization (like VMs from VMWare or Citrix) on their servers and exposed the management of these VMs via Web browser so that anyone could create an account and start managing the VM instances.
In a nutshell - that’s all there’s to it.
It is important to note that IaaS (or clouds) can be public, private or hybrid. Public clouds are based on infrastructure that is publicly available, i.e. IaaS provides generally gives access to its data center to anyone. Private clouds are built by individual organizations for their internal use. And hybrid clouds exhibit both types of behavior.
While public and private clouds are usually a physical infrastructures with difference being who gets the access to the clouds - the hybrid clouds are almost always a virtual clouds. Similar the Virtual Private Networks (VPN), the virtual clouds are created on top of one or more physical public and private clouds and provide its end users seamless cloud/IaaS transparency. These hybrid clouds are often created for business applications by either PaaS or software middleware like GridGain.
|
|
Clouds can be public, private, and hybrid. While public and private clouds are usually physical infrastructures - the hybrid clouds are always virtualized clouds built via software on top of one or more physical public or private clouds. Analogy between hybrid clouds and VPN is almost one-to-one. |
There are plenty of IaaS provides all following in steps with Amazon all providing different sets of functionality and different twists. A quick look at Amazon AWS offerings will show how complex and diverse it has become in recent years.
It is also a strong sign that hardware virtualization and services around it will be rapidly advancing. We are just few years away from being able to add an extra core to our application or acquire extra network bandwidth capacity on demand for our system’s pick time and scale it down the moment it doesn’t need it anymore.
These capabilities will undoubtedly change the way we develop our applications.
2.5.2. PaaS - Platform As a Service
PaaS stands for "Platform As a Service". PaaS essentially provides an abstraction over various IaaS providers and adds additional services. Additional services mostly consist of some set of deployment and provisioning services aiming at supporting application multi-tenancy, i.e. ability to host multiple applications in secure isolation on the same VM or a set of VMs (don’t confuse it with Java VM).
|
|
VMs vs. Java VMs
Throughout this book we’ll use terms VM to denote hardware virtualization virtual machine (VM). To denote Java Virtual Machine we’ll use term JVM. |
The problem with PaaS is that no one has a precise definition of what PaaS really is… Its definition is largely based on specific vendor capabilities. There is, however, one clear trait of PaaS: it abstracts out its users from worrying about specifics of various IaaS providers and differences in their operations and functionality.
PaaS also sprung out the notion of DevOps - a symbiosis of application development and traditional IT functions. It is often said that PaaS provides abstraction over IaaS and DevOps services.
|
|
PaaS provides abstraction over IaaS and DevOps services. |
Most of the PaaS vendors today (early 2011) concentrate mainly on providing deployment and application provisioning services. PaaS from VMWare/Spring, Google AppEngine, CloudBees and RedHat/JBoss, for example, do exactly that. They all allow you to take your whole application and through a serious of manual steps move or deploy it onto IaaS infrastructure with some limited, if any, scale out functionality.
PaaS as a technology today is in its very early stages. It is clear that PaaS as a concept and technology will likely see the most amount of changes in the coming years - and by the time you read this book some of these changes may significantly affect your understanding of what PaaS can do for you.
|
|
IaaS abstracts out data center and exposes it as a service. PaaS abstracts out IaaS providers and adds DevOps. |
2.5.3. SaaS - Software As a Service
IaaS stands for "Software As a Service". Surprisingly for most casual observers, SaaS has relatively nothing to do with cloud computing or IaaS and PaaS specifically. Essentially, if you run your "application" in the browser - it is SaaS application.
That’s it.
Historically, SaaS came from ASP (Application Service Provider) businesses and it shares almost everything with ASP (except for more catchy name). Interestingly enough SaaS was the first "as a service" abbreviation long before IaaS and PaaS came to light. But when hardware virtualization and surrounding services became popular, "as a service" moniker was a logical progression for providing computing Infrastructure and Platforms "as a service".
2.5.4. How GridGain Fits?
By looking at the picture few paragraphs above you can see that GridGain can easily work directly with IaaS (like Amazon AWS) or through the PaaS. In fact, GridGain is completely independent from either PaaS or IaaS - it can work without any specific cloud or grid or cluster infrastructure.
This ability, this lightweight approach, is one of the key design advantages of GridGain. It provides you, the developer, exactly the same services whether you run GridGain on a simple Android device, a laptop, few servers, small grid or a large cloud.
So, you can select the desired DevOps approach for your application and GridGain will happily support it!
2.6. License
GridGain is dual-licensed:
-
GridGain Community Edition is free open-sourced and licensed under GPL version 3.
-
GridGain Enterprise Edition is closed-sourced and commercially licensed.
EULA (end-used license agreement) for Enterprise Edition is available at GridGain root installation folder after you install GridGain and it is also available on download page on the http://www.gridgain.com website.
GridGain System also provides OEM, Enterprise an Academic licenses with further details available upon request at sales@gridgain.com
|
|
GridGain 1.x-2.x Note also that previous version of GridGain 1.x-2.x were licensed under LGPL. |
Note that throughout the book we don’t directly distinguish between features available in Enterprise and Community Edition. In cases where it is important we make a note.
2.7. Support
The best way to get a free support on GridGain software is to dip into our active community with wealth of information on our free support forum: http://jive.gridgain.org. This forum is closely monitored by GridGain System’s engineers and we try our best to provide free support there when applicable.
GridGain Systems, Inc., as a company behind GridGain project, also provides full spectrum of commercial services around GridGain software including:
-
Commercial subscription for Community and Enterprise editions
-
Consulting and professional services
-
OEM licensing
-
"Bronze", "Silver" and "Gold" levels of support
-
GridGain Training seminars
When it comes to training and support - we at GridGain have a very simple philosophy: we do our own heavy lifting. We believe that we are the best people to support our own software and who is it better to learn about GridGain from but the people who develop it daily?
All information about services provided by GridGain Systems can be found at http://www.gridgain.com/services.html
3. Taste of GridGain
Before we start digging into nitty-gridy details of GridGain functionality let’s quickly look at what we can accomplish in 10-15 minutes. We’ll create one application in Java and one in Scala utilizing Compute and In-Memory Data Grids.
Let’s see how quickly we can do both.
|
|
Installation
We have a whole chapter dedicated to installation. However, installation of GridGain is rather trivial - and if you haven’t done it already here is a quick 3 steps:
You are done! |
|
|
Unix
For the rest of this chapter (and most of this book) we will assume the JetBrain IDEA 10 and Unix environment like Unix, Linux or Mac OSX. Most of the steps and instructions apply almost verbatim to Eclipse, Emacs or NetBeans running on Windows with obvious changes to paths and certain project management capabilities of IDEs. |
Now - the first step when developing distributed application is to have a… grid. With GridGain you can have it anywhere but for the purpose of this example we will create one right on the same computer where you will be running the main example.
|
|
Multiple GridGain Nodes
One the coolest capability of GridGain is its ability to run multiple GridGain nodes on the same computer or even… inside the same JVM. Think about it - you can launch the entire cloud in a single JVM and enjoy local debugging of your application while it is running in the local virtualized cloud. Now - that’s pretty powerful and that’s exactly how we at GridGain to debug and test most of our complex internal distributed logic. |
To have a local grid we are going to have two GridGain nodes running standalone and the third node will be embedded into our applications that we will develop. When our application starts - it will join the grid (i.e. join the topology) making the grid of three nodes.
Open the command shell and assuming you are in GRIDGAIN_HOME folder just type this:
$ bin/ggstart.sh
If everything is fine (you set GRIDGAIN_HOME environment variable properly and you have Java installed) you will see the output similar to this:
[11:52:50] _____ _ _______ _ ____ ____
[11:52:50] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[11:52:50] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[11:52:50] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[11:52:50]
[11:52:50] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[11:52:50] ver. x.x.x-DDMMYYYY
[11:52:50] Copyright (C) 2005-2011 GridGain Systems, Inc.
[11:52:50]
[11:52:50] Quiet mode.
[11:52:50] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[11:52:50] << Enterprise Edition >>
[11:52:50] Daemon mode: off
[11:52:50] Language runtime: Java Platform API Specification ver. 1.6
[11:52:50] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49113, auth: off, ssl: off)]
[11:52:50] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[11:52:50] (!) SMTP is not configured - email notifications are off.
[11:52:50] (!) Cache is not configured - data grid is off.
[11:52:53] Topology snapshot [nodes=1, CPUs=4, hash=0xB12A5F18]
[11:52:54] License info:
[11:52:54] Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[11:52:54] License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[11:52:54] ^--License limits [<none>]
[11:52:54] System info:
[11:52:54] JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[11:52:54] OS: Mac OS X 10.7.1 x86_64, nivanov
[11:52:54] VM name: 4837@NIKITA-IVANOVs-MacBook-Pro.local
[11:52:54] Local ports used [TCP:8080 TCP:47100 UDP:47200 TCP:47300]
[11:52:54] GridGain started OK
[11:52:54] ^-- [grid=default, nodeId8=a26743ce, order=1315939970778, CPUs=4, addrs=[192.168.1.103]]
[11:52:54] ZZZzz zz z...
Start another command shell and type the same command again:
$ bin/ggstart.sh
This time you get almost identical output with few important changes:
[12:34:09] _____ _ _______ _ ____ ____
[12:34:09] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[12:34:09] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[12:34:09] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[12:34:09]
[12:34:09] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[12:34:09] ver. x.x.x-DDMMYYYY
[12:34:09] Copyright (C) 2005-2011 GridGain Systems, Inc.
[12:34:09]
[12:34:09] Quiet mode.
[12:34:09] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[12:34:09] << Enterprise Edition >>
[12:34:09] Daemon mode: off
[12:34:09] Language runtime: Java Platform API Specification ver. 1.6
[12:34:09] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49114, auth: off, ssl: off)]
[12:34:09] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[12:34:09] (!) SMTP is not configured - email notifications are off.
[12:34:09] (!) Cache is not configured - data grid is off.
[12:34:10] Node JOINED [nodeId8=a26743ce, addr=[192.168.1.103], CPUs=4]
[12:34:12] Topology snapshot [nodes=2, CPUs=4, hash=0xC287D25B]
[12:34:14] (!) Jetty failed to start (retrying every 3000 ms). Another node on this host?
[12:34:14] License info:
[12:34:14] Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[12:34:14] License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[12:34:14] ^--License limits [<none>]
[12:34:14] System info:
[12:34:14] JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[12:34:14] OS: Mac OS X 10.7.1 x86_64, nivanov
[12:34:14] VM name: 5227@NIKITA-IVANOVs-MacBook-Pro.local
[12:34:14] Local ports used [TCP:47101 UDP:47200 TCP:47301]
[12:34:14] GridGain started OK
[12:34:14] ^-- [grid=default, nodeId8=b8aac044, order=1315942449848, CPUs=4, addrs=[192.168.1.103]]
[12:34:14] ZZZzz zz z...
This output is a bit more interesting as it shows that both nodes discovered each other:
| Event of node joining the topology | |
| Snapshot of the topology showing total number of nodes and CPUs |
So at this point we have two nodes running (they are simply idling since we are not processing anything). Notice how didn’t have specify any configuration properties or configure anything at all. Everything works out-of-the-box as expected.
|
|
SPIs
As you will learn later in the book GridGain is composed of almost a dozen of different SPIs each providing pluggable kernel-level functionality. Two of these SPIs are discovery and communication SPIs that are responsible for maintaining distributed topology and exchanging the data between nodes. When you start GridGain with default configuration (like we just did) it starts with default SPI implementations (IP-multicast discovery and TCP/IP-based communication respectively) - and they work perfectly in our case. |
3.1. First GridGain Java App
Now that we have topology set we are going to switch to writing the actual code of our first application. Code examples below don’t have any dependencies on IDE - and you can follow up using any text editor of your choice.
The first app we are going to write would be computational MapReduce. It will calculate number of non-space characters in a given string by splitting the string into individual words, calculating word’s length on the remote nodes and aggregating results back.
We’ll use FP-based approach that GridGain natively support (even in Java):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import java.util.*;
import static org.gridgain.grid.GridClosureCallMode.*;
public class GridFunctionalMapReduceExample {
public static void main(final String[] args) throws GridException {
if (args.length == 1 && args[0].length() > 0)
GridFactory.in(new GridInClosureX<Grid>() {
@Override public void applyx(Grid g) throws GridException {
System.out.println("Length of input argument is " + g.reduce(
SPREAD,
GridFunc.<String, Integer>cInvoke("length"),
Arrays.asList(args[0].split(" ")),
GridFunc.sumIntReducer()
));
}
});
}
}
|
Not that there are many ways you can write this particular program in GridGain:
-
We can use direct Grid Task and Grid Jobs approach
-
We can use AOP-based grid enablement
-
We can GridGain 3.0 imperative APIs
-
Even using GridGain FP APIs there are different ways to code this program
FP-based approach above, however, yields probably the shortest program but can be initially confusing since Java isn’t really supporting FP natively. So let me explain step by step how this works.
Lines 1-4
We start by importing all necessary classes and constants into the scope.
Line 7
We define a main(…) method that will take input string.
Line 9
Method GridFactory.in(…) simply takes a closure and:
-
Starts the default GridGain node (custom configuration can be passed in as a parameter)
-
Executes passed in closure
-
Stop the GridGain node
So, essentially, it allows for a quick execution of the piece of code within context of a running GridGain node.
Line 9, 10
As a parameter to GridFactory.in(…) method call we are passing a newly created closure of
type GridInClosureX<Grid>. The body of this closure will be executed within context of a running
GridGain node.
Line 11-15
GridInClosureX<Grid> has one method applyx(Grid) which passed a Grid interface instance.
Inside of applyx(Grid) we call reduce(…) method that performs MapReduce operation. It accepts
four parameters:
-
Distribution mode. In our case we use GridClosureCallMode.SPREAD to spread the processing to all available nodes
-
A closure to execute on every remote node. We use utility method GridFunc.cInvoke(…) that creates closure via reflection based on the method name
-
A list of arguments that will be passed to each closure on the remote nodes
-
Reducing closure that takes results from the remote nodes and aggregates them into one final result. We, again, use pre-defined integer accumulator returned by GridFunc.sumIntReducer() method.
The logic of this computational MapReduce should be clear by now. We split input string by spaces into individual words, we then send every word to a remote node where a method length will be called on that word, and results of these calls will be returned back to reducer that will simply sum them up.
Now that we have a basic understanding of what is happening inside of this code let’s run this example. Depending on whether or not you are using IDE, Maven or Ant build you simply need to include gridgain.jar file that’s located in GRIDGAIN_HOME directory and all JARs under GRIDGAIN_HOME/libs folder to your classpath. Also, if you use IDEs - make sure that either environment variable GRIDGAIN_HOME is inherited by IDE process or system property with the same name GRIDGAIN_HOME is setup in your runtime configuration.
When it’s all set and done I’ve passed "GridGain is awesome" string my IDEA 10 runtime configuration and the got the following output in my IDEA output window:
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -ea -DGRIDGAIN_HOME=...
[12:28:11] _____ _ _______ _ ____ ____
[12:28:11] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[12:28:11] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[12:28:11] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[12:28:11]
[12:28:11] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[12:28:11] ver. x.x.x-DDMMYYYY
[12:28:11] Copyright (C) 2005-2011 GridGain Systems, Inc.
[12:28:11]
[12:28:11] Quiet mode.
[12:28:11] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[12:28:11] << Enterprise Edition >>
[12:28:11] (!) SMTP is not configured - email notifications are off.
[12:28:11] (!) Cache is not configured - data grid is off.
[12:28:11] Daemon mode: off
[12:28:11] Language runtime: Java Platform API Specification ver. 1.6
[12:28:11] Remote Management [restart: off, REST: on, JMX (remote: off)]
[12:28:11] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[12:28:13] Node JOINED [nodeId8=b324d6c3, addr=[192.168.1.103], CPUs=4]
[12:28:15] Topology snapshot [nodes=2, CPUs=4, hash=0xCFFF5AA0]
[12:28:15] Node JOINED [nodeId8=e854d435, addr=[192.168.1.103], CPUs=4]
[12:28:15] Topology snapshot [nodes=3, CPUs=4, hash=0xF7C10287]
[12:28:15] License info:
[12:28:15] Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[12:28:15] License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[12:28:15] ^--License limits [<none>]
[12:28:15] New version is available at www.gridgain.com: 3.2.1c.05082011
[12:28:15] System info:
[12:28:15] JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[12:28:15] OS: Mac OS X 10.7.1 x86_64, nivanov
[12:28:15] VM name: 9791@NIKITA-IVANOVs-MacBook-Pro.local
[12:28:15] Local ports used [TCP:8080 TCP:47102 UDP:47200 TCP:47302]
[12:28:15] GridGain started OK
[12:28:15] ^-- [grid=default, nodeId8=1cdf9436, order=1316028492427, CPUs=4, addrs=[192.168.1.103]]
[12:28:15] ZZZzz zz z...
Length of input argument is 16
[12:28:17] GridGain stopped OK [uptime=00:00:01:363]
As you can see the output is very similar to standalone nodes we’ve started few minutes ago. But in the end we have output of our MapReduce task which says:
Length of input argument is 16
correctly computing number of non-empty characters in input string "GridGain is awesome". Notice also we’ve had topology snapshot with three nodes (as expected). If you check other standalone nodes you will see the similar output to this:
[12:28:13] Node JOINED [nodeId8=1cdf9436, addr=[192.168.1.103], CPUs=4] [12:28:13] Topology snapshot [nodes=3, CPUs=4, hash=0xF7C10287] [12:28:17] Node LEFT [nodeId8=1cdf9436, addr=[192.168.1.103], CPUs=4] [12:28:17] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]
indicating that when our application started we’ve had three nodes in the topology and when our application completed and stopped we were back to two nodes (i.e. two standalone nodes).
|
|
Zero Deployment
Did you notice any deployment steps? Any Ant or Maven build? Did we create any JAR files to copy to remote nodes? As you probably guessed - the answer is no. We didn’t need to do any of these awkward and expensive steps because GridGain sports pretty unique technology that allows it deploy necessary classes on-demand in a distributed fashion completely transparently to the developer. In fact - you just write the code as if it is completely local and GridGain will take care of proper distribution, versioning, class loading, etc. |
Now, let’s advance our example. As you probably noticed we don’t see any evidence on remote nodes that any processing is happening there. In fact, by default GridGain starts in QUIET mode and most of the output is suppressed (if you need to start in normal mode - use -v flag for ggstart.sh script).
Let’s modify our example so that we’ll see what is being process and where:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import java.util.*;
import static org.gridgain.grid.GridClosureCallMode.*;
public class GridFunctionalMapReduceExample {
public static void main(final String[] args) throws GridException {
if (args.length == 1 && args[0].length() > 0)
GridFactory.in(new GridInClosureX<Grid>() {
@Override public void applyx(Grid g) throws GridException {
System.out.println("Length of input argument is " + g.reduce(
SPREAD,
new GridClosure<String, Integer>() {
@Override public Integer apply(String s) {
System.out.println("Calculating for: " + s);
return s.length();
}
},
//GridFunc.<String, Integer>cInvoke("length"),
Arrays.asList(args[0].split(" ")),
GridFunc.sumIntReducer()
));
}
});
}
}
|
We’ve commented out the reflection-based closure and added direct closure creation that prints out what string it is working on and returns its length. If we re-run our application we’ll now get the following output on three nodes:
Remote Node 1
[14:14:27] Node JOINED [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:27] Topology snapshot [nodes=3, CPUs=4, hash=0x89DEB0E3] Calculating for: is[14:14:31] Node LEFT [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:31] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]
Remote Node 2
[14:14:27] Node JOINED [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:27] Topology snapshot [nodes=3, CPUs=4, hash=0x89DEB0E3] Calculating for: awesome[14:14:31] Node LEFT [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:31] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]
Local node in IDE
[14:14:29] GridGain started OK [14:14:29] ^-- [grid=default, nodeId8=6e77be3c, order=1316034866664, CPUs=4, addrs=[192.168.1.103]] [14:14:29] ZZZzz zz z... Calculating for: GridGainLength of input argument is 16
[14:14:31] GridGain stopped OK [uptime=00:00:01:237]
| - That’s the output from our closure executing on remote nodes. | |
| - That’s the output from the reduction step executing on the local (initiating) node. |
Note that local node (i.e. the node running in IDE) performs calculation as well as performing the final reduction step. If we don’t want it to participate in the actual calculation and only perform the final reduction - we can simply change this line:
System.out.println("Length of input argument is " + g.reduce(
to this
System.out.println("Length of input argument is " + g.remoteProjection().reduce(
|
|
Consider this…
Look at these 20 lines of code and consider that this application includes:
|
Pretty neat, right? And so just in about two dozens lines of code and 10 minutes we’ve got our first MapReduce application running.
3.2. First GridGain Scala App
Now - let’s move to In-Memory Data Grid application and we’ll use Scala for that, more specifically - Scalar, our Scala-based DSL for GridGain.
|
|
Scalar DSL
The idea behind Scalar is to simply adopt Java-side APIs for usage in Scala. Scalar by design does not add any additional new functionality to GridGain but adopts Java APIs to Scala. This is a very important point to understand that there’s no additional or left out functionality when you are switching between Java and Scala - 100% of GridGain is available in both languages (nad natively so). Note that GridGain also comes with Grover - Groovy++ DSL. |
Since we are going to use In-Memory Data Grid in this example we need to restart our standalone nodes with enabled data grid. Note that by default there are no caches configured (for obvious performance reasons). GridGain comes with handy example configuration that comes with three caches configurated examples/config/spring-cache.xml.
You can stop existing nodes by simply Ctrl-C and then start them again using:
bin/ggstart.sh examples/config/spring-cache.xml
The output from the nodes is almost the same as in previous example with one notable change:
... [13:28:05] Configured caches ['partitioned', 'replicated', 'local'] ...
indicating that we now have three configured caches that are named based on their type.
Now that we have standalone nodes running with necessary configuration let’s turn to writing our application that will utilize the In-Memory Data Grid (we’ll use shorter term data grid going forward) side of GridGain. We are going to create an application that will populate data grid with a set of key-value pairs and then execute the set of closures where each closure will have an affinity with the specific key in the data grid - and therefore it will be co-located with the data for that key instead of just randomly be executed on some node in the grid.
|
|
Affinity Co-Location
Affinity co-location is extremely important use case in real-time processing as it underpins the system design that can scale linearly regardless of the size of the data set. |
Scalar-based (Scalar is GridGain’s DSL based on Scala) application looks pretty simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import org.gridgain.scalar.scalar
import scalar._
import org.gridgain.grid.cache.GridCache
object ScalarCacheAffinitySimpleExample {
/** Number of keys. */
private val KEY_CNT = 20
def main(args: Array[String]) {
scalar("examples/config/spring-cache.xml") {
val c = grid$.cache[Int, String]("partitioned")
populate(c) // Comment out on subsequent runs.
colocate(c)
}
}
private def populate(c: GridCache[Int, String]) {
(0 until KEY_CNT).foreach(i => c += (i -> i.toString))
}
private def colocate(c: GridCache[Int, String]) {
(0 until KEY_CNT).foreach(i =>
grid$.affinityRun("partitioned", i,
() => println("Co-located [key= " + i + ", value=" + c.peek(i) + ']'))
)
}
}
|
Even if you are not familiar with Scala - the code looks pretty self-explanatory. Let’s go line by line like we did for Java example above.
Line 1-3
Necessary imports including import for Scalar
Line 7
Defines number of keys we’ll be storing in the data grid and number of closures we’ll be executing later.
Line 10
Initializes Scalar with the same configuration file as we used for standalone nodes
examples/config/spring-cache.xml. Note that initializing Scalar essentially means starting up
the local node.
Line 11
We are getting an instance of the cache named partitioned (which is a partitioned cache named so for
clarity). Cache is typed for Int keys and String values.
Line 13, 14
Calls functions populate() and colocate() that are defined later.
Line 18-20
Function populate() simply puts key/value pairs into data grid. Node that partitioned cache will store
particular key/value pair on one of the nodes (potentially including the local one) as well as on one back up
node. Since we have three nodes in the topology (remember - one local node and two standalone noes) - each
key/value pair will be store on two nodes.
|
|
Running Multiple Time
Note that if you run this example multiple time you need to comment out line 13 since we don’t need to override value that are already in the data grid. Don’t get confused here: even though we are stopping the local node when our application finished and all data that was stored on this node will be lost - the key/value pairs are duplicated on backup nodes (i.e. stored twice in the data grid). When we start our application again the pre-loading process will optimally reshuffle the data from two existing nodes to new three nodes topology. Note that number of backup nodes and details of pre-loading process are fully configurable. |
Line 22-26
Function colocate() executes number of closures where each closure gets affinity co-located with some key
in the data grid. Note that the closure itself simply prints the trace message and uses function peek() that
gets the value only if it’s locally available - which should be since we are co-locating closure with the node
where data is stored (so called master node).
|
|
Affinity Co-Location
The colocate() function is the key functionality here. Look how simple it is to co-locate the computational logic (a closure) with the data this logic need to process (data in cache). |
Let’s go ahead and start our application. Starting Scala application is no different than starting Java application (at least if you use IDEs). Local node running in IDEA prints out the following log (abbreviated):
[22:57:01] _____ _ _______ _ ____ ____
[22:57:01] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[22:57:01] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[22:57:01] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:57:01]
[22:57:01] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:57:01] ver. x.x.x-DDMMYYYY
[22:57:01] Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:57:01]
[22:57:01] Quiet mode.
[22:57:01] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:57:01] << Enterprise Edition >>
...
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
...
[22:57:05] ZZZzz zz z...
Co-located [key= 0, value=0]
Co-located [key= 1, value=1]
Co-located [key= 7, value=7]
Co-located [key= 10, value=10]
Co-located [key= 18, value=18]
[22:57:09] GridGain stopped OK [uptime=00:00:03:171]
and the remote nodes print:
Remote Node 1
[22:56:40] _____ _ _______ _ ____ ____
[22:56:40] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[22:56:40] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[22:56:40] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:56:40]
[22:56:40] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:56:40] ver. x.x.x-DDMMYYYY
[22:56:40] Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:56:40]
[22:56:40] Quiet mode.
[22:56:40] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:56:40] << Enterprise Edition >>
...
[22:56:42] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
...
[22:57:02] Node JOINED [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
Co-located [key= 2, value=2]
Co-located [key= 3, value=3]
Co-located [key= 4, value=4]
Co-located [key= 5, value=5]
Co-located [key= 11, value=11]
Co-located [key= 12, value=12]
Co-located [key= 13, value=13]
Co-located [key= 16, value=16]
[22:57:08] Node LEFT [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:08] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
Remote Node 2
[22:56:40] _____ _ _______ _ ____ ____
[22:56:40] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[22:56:40] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[22:56:40] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:56:40]
[22:56:40] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:56:40] ver. x.x.x-DDMMYYYY
[22:56:40] Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:56:40]
[22:56:40] Quiet mode.
[22:56:40] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:56:40] << Enterprise Edition >>
...
[22:56:39] Topology snapshot [nodes=1, CPUs=4, hash=0xF15A46A8]
...
[22:56:42] Node JOINED [nodeId8=de5c9bb0, addr=[127.0.0.1], CPUs=4]
[22:56:42] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
[22:57:02] Node JOINED [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
Co-located [key= 6, value=6]
Co-located [key= 8, value=8]
Co-located [key= 9, value=9]
Co-located [key= 14, value=14]
Co-located [key= 15, value=15]
Co-located [key= 17, value=17]
Co-located [key= 19, value=19]
[22:57:08] Node LEFT [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:08] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
As you can see the key/value pairs got distributed roughly equal (the more keys the better the distribution will be obviously). What’s also important to note is that we didn’t get any null as values proving the fact that co-location work (remember: we’ve used function peek() that only return locally stored value or null if value for given key is not stored locally).
All in all - these were two quick examples of Java and Scala based applications that demonstrate some of the basics of GridGain functionality. The following chapters in the book will explain why these are just a scratch on the surface…
4. Getting and Installing
Now that we’ve looked briefly at what GridGain can do let’s start… from the beginning: how to get GridGain software and how to install it.
|
|
Screenshots Note that most of the website screenshots will change by the time you read this book. However, you can
easily navigate the website as it is right now as its main parts remained relatively the same. |
4.1. Download
There are three ways how you can get GridGain:
-
you can download it from http://www.gridgain.com,
-
or you can use Maven2 repository to get it,
-
or you can get Community Edition from GitHub
We highly recommend to use the first method and simply download ZIP file from http://www.gridgain.com website. To do so - simply open http://www.gridgain.com in your favorite browser and locate the download link that usually on the right side:
Once you clicked on the download link you’ll be on download page and you’ll need to enter your name and email:
Keep in mind several things:
-
There are two editions available for the download - enterprise and community
-
Community Edition is licensed under GPLv3 and Enterprise Edition comes with evaluation license
-
There is a link on top of the page for past downloads that contains selected previous releases of GridGain
-
In the download table (see above) you can see date of the build, its version, and the link to Release Notes
There are six downloads (as of version 3.0.2):
-
Enterprise and Community for Windows
-
Enterprise and Community for Linux/Unix/Mac OS.
-
Amazon AMI images for Enterprise and Community editions
All downloads are simple ZIP files. ZIP files are versioned and clearly named to indicate for what OS family they are intended to.
4.1.1. Maven2
Maven repository available only for version 3.0.0c-RC1 and up.
If you decide to use Maven please keep in mind:
-
Only Maven2 repository currently available (as of version 3.0.2)
-
Maven3 is not supported yet.
-
-
Only community edition is available in our public repository
-
Enterprise edition can only be downloaded directly from http://www.gridgain.com website
-
Maven2 POM file is included with distribution
-
-
Only the main GridGain JAR file is available in Maven repository
-
Depending on your usage of GridGain you may need configuration files, working directly, etc. that won’t be created when using Maven to get GridGain.
-
To utilize our Maven repository you’ll need to make the following changes. In your POM file you need to add dependency for GridGain:
1 2 3 4 5 6 7 8 9 10 | <dependencies>
.
.
.
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>gridgain</artifactId>
<version>3.0.0c-rc1</version> <!-- CHANGE IT! -->
</dependency>
</dependencies>
|
Make sure to properly change the version of the GridGain.
You will need to add GridGain repository to your POM file as well:
1 2 3 4 5 6 7 8 9 | <repositories>
.
.
.
<repository>
<id>gridgain</id>
<url>http://www.gridgainsystems.com/maven2/</url>
</repository>
</repositories>
|
Once you have it done - you are ready for Maven-based usage of GridGain.
|
|
Internal Repository We recommend to use internal Maven repositories for your projects if Maven is something you like
to use. You can download GridGain as usual through www.gridgain.com website and deploy necessary
files to your local repository for the rest of the team to use. This way you have full control
on how GridGain is available via Maven for your particular project. |
4.1.2. Versions
GridGain follows traditional rules on versioning and what specific version number means:
| Version | Description |
|---|---|
X.X.1…9 |
Point release. |
X.1…9.X |
Mid-point release. |
1…9.X.X |
Major release. |
In general, we target one major release every 12 months and mid-point release every 6 months. Point releases are being cut as we see need to patch issues or provide hot bug fixes.
4.1.3. Supported Operating Systems
GridGain is actively developed and tested on three major operating systems:
|
Mac OS X |
|
Windows 7 |
|
Linux (Ubuntu & Fedora) |
Being JVM-based software GridGain has minimal dependency on particular operating system (as long as Java is available for it). Most of the dependencies are in scripts. With every release of GridGain we thoroughly testing the software against the following version of operating systems:
-
Mac OS X 10.x
-
Linux Ubuntu (current active release)
-
Linux Fedora (current active release)
-
Windows XP/Vista/2007 (as of 3.0.2 version)
Note that we do not actively test against the following operating system but verified independently that GridGain 3.0 or later works stable and correct on them:
-
Solaris (current release)
-
HP-UX (current release)
-
Window 2003, Windows 2000
|
|
Less Tested GridGain is less tested on: Solaris, HP-UX, Window 2003, and Windows 2000. |
In general, with extremely rare exceptions, GridGain will work out-of-the-box on any Windows or Linux/Unix-based system as long as Java 6 (and Scala 2.9 or later) is available on it.
4.1.4. Java, Scala and Groovy
As of version 3.0 GridGain requires Java 6. Note that GridGain 3.0 has not been tested with upcoming Java 7 as of May 2011.
Starting with version 3.0.9 GridGain requires Scala 2.9 or later (if Scala is used which is optional). Note that original release of GridGain 3.0.0 came before Scala 2.8 GA was released and was compatible only with Scala 2.7.
Starting with version 3.1.1 GridGain requires Groovy 1.8 and corresponding version of Groovy++ (if Groovy is used which is optional).
Keep in mind that you can develop with either Java, Scala, Groovy or any combination of thereof. Specifically, Scala is not required to develop with GridGain but some of the tools, like GridGain Visor - monitoring and interpreting tool in Enterprise Edition, use Scala REPL and therefore Scala is required for its usage.
Note also that as of GridGain 3.0.2 - none of the functionality in community edition explicitly require Scala or Groovy.
|
|
Java, Scala and Groovy As of version 3.1.1 GridGain requires Java 6, Scala 2.9 and Groovy 1.8. |
As of November 2010 you can download both Java, Scala, and Groovy from:
-
Latest Java: http://www.oracle.com/technetwork/java/javase/downloads/index.html
-
Latest Scala: http://www.scala-lang.org/node/165
-
Latest Groovy: http://groovy.codehaus.org/Download
-
Latest Groovy++: http://code.google.com/p/groovypptest/downloads/listp
|
|
Java on Mac OS X Note that Java download for Mac OSX may change its location as its development is shifting from
Apple to Oracle as of November 2010. |
4.2. Installation
Once you download whatever ZIP file your have selected - the installation process is rather trivial:
-
Unzip it to any location you prefer.
-
Set up GRIDGAIN_HOME environment variable pointing to installation folder
Note that installation does not perform any new-line translations and text files may have wrong new-lines depending on what OS installation is performed.
Unix/Linux/Mac OSX ZIP file has all Shell scripts with executable flag set so that they can be called directly.
|
|
GRIDGAIN_HOME Note that strictly speaking GRIDGAIN_HOME is not required for GridGain operation - and if
you know that your setup won’t require it (explained later in the book) - you can skip it.
If you are new to GridGain - it’s very advisable to set GRIDGAIN_HOME right after the
unzipping the downloaded file. |
|
|
Trailing Spaces Make sure there is no trailing \ in GRIDGAIN_HOME path. |
4.2.1. Installing On Shared Location
One good practice for testing, staging or production setups is to install GridGain into shared location like a network share or shared hard drive. This way multiple grid nodes can share single configuration, libraries and working directory. This significantly simplifies management of GridGain installation in a distributed environment.
4.3. Uninstallation
Uninstalling GridGain is even simple than installing - you simply remove the GRIDGAIN_HOME folder where GridGain was installed. If it was configured to use paths outside of GRIDGAIN_HOME you will need to delete them too (if necessary).
4.4. Upgrading
Due to complexity of GridGain (mostly due to its distributed nature) we have decided not to provide incremental upgrade (or patching) capabilities. We recommend upgrading GridGain by cleanly uninstalling and installing a new upgrade version.
5. Configuration
5.1. Overview
GridConfigurationdoc interface defines grid runtime configuration. This configuration is passed to GridFactory.start(GridConfiguration)doc method. It defines all configuration parameters required to start a grid instance. Usually, a special class called "loader" will create an instance of this interface and call GridFactory.start(GridConfiguration)doc method to initialize GridGain instance.
Note, that absolutely every configuration property in GridConfigurationdoc is optional. You can simply create a new instance of GridConfigurationAdapterdoc, for example, and pass it to GridFactory.start(GridConfiguration)doc as is to start grid with default configuration. See GridFactorydoc documentation for information about default configuration properties used and more information on how to start grid.
The following configuration parameters can be used to configure grid node with GridConfigurationAdapter:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setGridName(String)doc |
Grid name. |
Yes |
null |
setGridGainHome(String)doc |
GridGain installation folder. |
Yes |
GRIDGAIN_HOME system property or environment variable. |
setLocalHost(String)doc |
System-wide local address or host for all GridGain components to bind to. |
Yes |
null |
setNodeId(UUID)doc |
Unique identifier for local node. |
Yes |
Random UUID. |
setNetworkTimeout(long)doc |
Maximum timeout in milliseconds for network requests. |
Yes |
5000ms |
setLicenseUrl(String)doc |
License URL different from the default location of the license file. |
Yes |
GRIDGAIN_HOME/gridgain-license.xml |
setUserAttributes(Map<String,? extends Serializable>)doc |
User specific attributes to attach to this node. Available via GridNode.getAttribute(String)doc method. Very useful for segmenting grid nodes into subgroups or identifying nodes based on certain property. |
Yes |
All System Properties and Environment Variables are set as node attributes automatically by GridGain. |
setDaemon(boolean)doc |
Daemon flag. |
Yes |
false |
setIncludeProperties(String…)doc |
Array of system or environment property names to include into node attributes. |
Yes |
All properties are included by default. |
setIncludeEventTypes(int…)doc |
Array of event types, which will be recorded by GridEventStorageManager. Note, that either the include event types or the exclude event types can be established. |
Yes |
All events are recorded by default. |
setExcludeEventTypes(int…)doc |
Array of event types, which will not be recorded by GridEventStorageManager. Note, that either the include event types or the exclude event types can be established. |
Yes |
All events are recorded by default. |
setLifecycleBeans(GridLifecycleBean…)doc |
Collection of lifecycle beans. |
Yes |
null |
setLifeCycleEmailNotification(boolean)doc |
Whether or not to enable lifecycle email notifications. |
Yes |
false |
setDiscoveryStartupDelay(long)doc |
Time in milliseconds after which a certain metric value is considered expired. |
Yes |
1 minute |
setGridLogger(GridLogger)doc |
Logger to use within grid. |
Yes |
GridLog4jLoggerdoc |
setMarshaller(GridMarshaller)doc |
Marshaller to use for serialization/deserialization of objects (available from ver. 2.1). |
Yes |
GridOptimizedMarshallerdoc |
setDeploymentMode(GridDeploymentMode)doc |
Deployment mode for task/query requests initiated from this node (available from ver. 2.1). |
Yes |
SHAREDdoc |
setPeerClassLoadingEnabled(boolean)doc |
Enables/disables peer class loading. |
Yes |
true |
setPeerClassLoadingMissedResourcesCacheSize(int)doc |
Specifies internal cache size for missed resources. If attempt to load a resource failed, then it will be cached, and following attempts will not make remote calls (available from ver. 2.1). |
Yes |
100 |
setP2PLocalClassPathExclude(List<String>)doc |
List of packages in a system class path that should be to P2P loaded even if they exist locally. |
Yes |
null |
setMetricsExpireTime(long)doc |
Time in milliseconds after which a certain metric value is considered expired. |
Yes |
600000ms |
setMetricsHistorySize(int)doc |
Number of metrics kept in history to compute totals and averages. |
Yes |
10000 |
setMetricsLogFrequency(int)doc |
Frequency of metrics log print out. |
Yes |
0, which means that metrics print out is disabled. |
setExecutorService(ExecutorService)doc |
Thread pool to use mainly for task and job execution. |
Yes |
GridThreadPoolExecutordoc with 100 threads. |
setExecutorServiceShutdown(boolean)doc |
Executor service shutdown flag. |
Yes |
true |
setSystemExecutorService(ExecutorService)doc |
Thread pool to use for processing job and task session asynchronous responses (available from ver. 2.1). |
Yes |
GridThreadPoolExecutordoc with 100 threads. |
setSystemExecutorServiceShutdown(boolean)doc |
System executor service shutdown flag. |
Yes |
true |
setPeerClassLoadingExecutorService(ExecutorService)doc |
Thread pool to use for processing peer class loading requests and responses (available from ver. 2.1). |
Yes |
GridThreadPoolExecutordoc with 20 threads. |
setPeerClassLoadingExecutorServiceShutdown(boolean)doc |
Peer class loading executor service shutdown flag. |
Yes |
true |
setMBeanServer(MBeanServer)doc |
MBean server for exposing GridGain MBeans. |
Yes |
The default MBean Server provided by JDK. |
setSegmentationPolicy(GridSegmentationPolicy)doc |
Segmentation policy. |
Yes |
STOPdoc |
setSegmentationResolvers(GridSegmentationResolver…)doc |
Segmentation resolvers. |
Yes |
null |
setSegmentCheckFrequency(long)doc |
Network segment check frequency. |
Yes |
10000ms |
setWaitForSegmentOnStart(boolean)doc |
Wait for segment on start flag. |
Yes |
true |
setAllSegmentationResolversPassRequired(boolean)doc |
All segmentation resolvers pass required flag. |
Yes |
true |
setRestEnabled(boolean)doc |
Flag indicating whether external REST access is enabled or not. |
Yes |
true |
setRestJettyPath(String)doc |
Path, either absolute or relative to GRIDGAIN_HOME, to JETTY XML configuration file. |
Yes |
null |
setRestSecretKey(String)doc |
Secret key to authenticate REST requests. |
Yes |
null, which means that authentication is disabled. |
setSmtpHost(String)doc |
SMTP host. |
Yes |
null, which disables sending emails. |
setSmtpPort(int)doc |
SMTP port. |
Yes |
25 |
setSmtpUsername(String)doc |
SMTP username. |
Yes |
null |
setSmtpPassword(String)doc |
SMTP password. |
Yes |
null |
setAdminEmails(String[]) |
Set of admin emails where email notifications will be set. |
Yes |
null |
setSmtpFromEmail(String)doc |
FROM email address for email notifications. |
Yes |
info@gridgain.com |
setSmtpSsl(boolean)doc |
Whether or not SMTP uses SSL. |
Yes |
false |
setSmtpStartTls(boolean)doc |
Whether or not SMTP uses STARTTLS. |
Yes |
false |
setLocalEventListeners(Map<GridLocalEventListener, int[]>)doc |
Pre-configured local event listeners. |
Yes |
null |
setLoadBalancingSpi(GridLoadBalancingSpi…)doc |
Fully configured instances of GridLoadBalancingSpi. Starting with GridGain 2.1 you can provide multiple instances of Load Balancing SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridRoundRobinLoadBalancingSpidoc |
setCheckpointSpi(GridCheckpointSpi…)doc |
Fully configured instances of GridCheckpointSpi. Starting with GridGain 2.1 you can provide multiple instances of Checkpoint SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridSharedFsCheckpointSpidoc |
setCollisionSpi(GridCollisionSpi)doc |
Fully configured instance of GridCollisionSpi. |
Yes |
GridFifoQueueCollisionSpidoc |
setCommunicationSpi(GridCommunicationSpi)doc |
Fully configured instance of GridCommunicationSpi. |
Yes |
GridTcpCommunicationSpidoc |
setDeploymentSpi(GridDeploymentSpi)doc |
Fully configured instance of GridDeploymentSpi. |
Yes |
GridLocalDeploymentSpidoc |
setDiscoverySpi(GridDiscoverySpi)doc |
Fully configured instance of GridDiscoverySpi. |
Yes |
GridMulticastDiscoverySpidoc |
setEventStorageSpi(GridEventStorageSpi)doc |
Fully configured instance of GridEventStorageSpi. |
Yes |
GridMemoryEventStorageSpidoc |
setFailoverSpi(GridFailoverSpi…)doc |
Fully configured instances of GridFailoverSpi. Starting with GridGain 2.1 you can provide multiple instances of Failover SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridAlwaysFailoverSpidoc |
setTopologySpi(GridTopologySpi…)doc |
Fully configured instances of GridTopologySpi. Starting with GridGain 2.1 you can provide multiple instances of Topology SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridBasicTopologySpidoc |
setMetricsSpi(GridLocalMetricsSpi)doc |
Fully configured instance of GridLocalMetricsSpi. |
Yes |
GridJdkLocalMetricsSpidoc |
Some of the most commonly used configuration properties are explained in more detail below.
5.1.1. Grid Name
Use grid name configuration property whenever you would like to identify your grid by name. Usually, if you have only one grid node within your VM, you don’t have to configure grid name explicitly and use the default no-name grid node. However, if you start multiple grid node instances in the same VM, say for unit testing or debugging, then properly configuring grid name for every grid node instance will allow you to access multiple grid nodes by name via GridFactory.getGrid(String gridName)doc method.
5.1.2. User Attributes
User attributes allow you to attach various custom attributes to your nodes. This attributes can then be used to identify node topology for your task execution or load balancing, segmenting your grid into multiple sub-grids, etc… By default, GridGain will automatically attach or System and Environment properties to your node.
You can query node attributes practically from anywhere in your code, be that your task or job logic, or implementation of topology or load-balancing SPI’s. Simply get a handle on GridNode and check its attributes via GridNode.getAttribute(String)doc method.
5.1.3. Grid Logger
Configuring proper grid logger will allow you to integrate your logging with any environment. By default, GridLog4jLoggerdoc is used which gets its logging configuration from GRIDGAIN_HOME/config/default-log4j.xml.
Below is the list of supported loggers:
-
GridLog4jLoggerdoc - Log4j-based implementation for logging. This logger should be used by loaders that have prefer log4j-based logging. By default, GridGain will use this logger with configuration from GRIDGAIN_HOME/config/default-log4j.xml.
-
GridJavaLoggerdoc - Logger to use with Java logging. Implementation simply delegates to Java Logging.
-
GridJbossLoggerdoc - Logger to use in JBoss loaders. Implementation simply delegates to JBoss logging.
-
GridJclLoggerdoc - This logger wraps any JCL (Jakarta Commons Logging) loggers. Implementation simply delegates to underlying JCL logger. This logger should be used by loaders that have JCL-based internal logging (e.g., Websphere).
5.1.4. Grid Marshaller
Starting with GridGain 2.1 release you are able to configure different marshallers, and if needed provide your own. GridMarshallerdoc allows to marshal or unmarshal objects. It provides serialization/deserialization mechanism for all instances that are sent across network or are otherwise serialized.
GridGain provides the following GridMarshaller implementations:
-
GridJBossMarshallerdoc - this is the default marshaller used by GridGain. It used JBoss implementation of java.io.ObjectOutputStream for object serialization. All marshalled instances must implement java.io.Serializable.
-
GridJdkMarshallerdoc - this marshaller uses standard JDK java.io.ObjectOutputStream for object serialization.. All marshalled instances must implement java.io.Serializable.
-
GridXstreamMarshallerdoc - this marshaller uses Codehaus XStream for serialization of objects into XML. It does not require that marshalled instances implement java.io.Serializable, however, it performs slower than other marshaller implementations as XML is a verbose protocol.
-
GridOptimizedMarshallerdoc - Unlike GridJdkMarshaller, which is based on standard ObjectOutputStream, this marshaller does not enforce that all serialized objects implement java.io.Serializable. It is also generally much faster as it removes lots of serialization overhead that exists in default JDK implementation.
5.1.5. Executor Services
Starting with version 2.1, GridGain exposes configuration for 3 threads pools:
-
ExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for task and job executions. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 100 threads. Change this configuration parameter whenever you need to change the number of threads participating in GridTask/GridJob execution.
-
SystemExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for processing of asynchronous job and task session responses. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 100 threads. Change this configuration parameter whenever you set task session attributes frequently or feel that responses are not processed fast enough.
-
PeerClassLoadingExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for processing of all Peer Class Loading requests. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 20 threads. Change this configuration parameter whenever you feel that class-loading requests don’t get processed fast enough.
|
|
Do not confuse executor services provided in configuration for thread pooling with grid-enabled executor service provided by GridGain. |
5.1.6. Grid Lifecycle Beans
See Grid Lifecycle Beans documentation for information on how to specify lifecycle beans and examples.
5.1.7. SPIs - Server Provider Interfaces
Server Provider Interfaces allow you to configure virtually every aspect of GridGain, such as communication, discovery, topology and failover, load-balancing, etc… in LEGO-like fashion. For information on available SPI’s and their configuration refer to SPI’s documentation.
5.2. Specifying Different SPIs Per GridTask
Starting with GridGain 2.1 you can start multiple instances of Topology SPI, Load Balancing SPI, Failover SPI and Checkpoint SPI. If you do that, you need to tell a task which SPI to use (by default it will use the first SPI in the list).
Add @GridTaskSpisdoc annotation for your task to specify what SPIs it wants to use. If this annotation is omitted, then by default GridGain will pick the first corresponding SPI implementation from the array of SPIs provided in configuration.
This example shows how to configure different SPI’s for different tasks. Let’s assume that you have two worker nodes, Node1 and Node2. Let’s also assume that you configure Node1 to belong to SegmentA and Node2 to belong to SegmentB. Here is a sample configuration for Node1:
1 2 3 4 5 6 7 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="userAttributes">
<map>
<entry key="segment" value="A"/>
</map>
</property>
</bean>
|
Node2 configuration looks similar to Node1 with segment attribute set to B:
1 2 3 4 5 6 7 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="userAttributes">
<map>
<entry key="segment" value="B"/>
</map>
</property>
</bean>
|
Now, if you have Task1 and Task2 starting from some master node NodeM, you can easily configure Task1 to only run on SegmentA and Task2 to only run on SegmentB. Here is how configuration on master node NodeM would look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<!--
Topology SPIs. We have two named SPIs: One picks up nodes
that have attribute "segment" set to "A" and another one sees
nodes that have attribute "segment" set to "B".
-->
<property name="topologySpi">
<list>
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="name" value="topologyA"/>
<property name="filter">
<bean class="org.gridgain.grid.GridJexlNodeFilter">
<property name="expression" value="node.attributes['segment'] == 'A'"/>
</bean>
</property>
</bean>
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="name" value="topologyB"/>
<property name="filter">
<bean class="org.gridgain.grid.GridJexlNodeFilter">
<property name="expression" value="node.attributes['segment'] == 'B'"/>
</bean>
</property>
</bean>
</list>
</property>
</bean>
|
Then your Task1 and Task2 would look as follows (note the @GridTaskSpis annotation):
1 2 3 4 | @GridTaskSpis(topologySpi="topologyA")
public class GridSegmentATask extends GridTaskSplitAdapter<String, Integer> {
...
}
|
and
1 2 3 4 | @GridTaskSpis(topologySpi="topologyB")
public class GridSegmentBTask extends GridTaskSplitAdapter<String, Integer> {
...
}
|
5.3. GridSpringBean
Grid Spring bean allows to bypass GridFactorydoc methods. In other words, this bean class allows to inject new grid instance from Spring configuration file directly without invoking static GridFactorydoc methods. This class can be wired directly from Spring and can be referenced from within other Spring beans. By virtue of implementing org.springframework.beans.factory.DisposableBean and org.springframework.beans.factory.InitializingBean interfaces, GridSpringBean automatically starts and stops underlying grid instance.
The following configuration parameters are optional:
-
Grid configuration (see setConfiguration(GridConfiguration)doc)
5.3.1. Spring Configuration Example
1 2 3 4 5 6 7 | <bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean" scope="singleton">
<property name="configuration">
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="gridName" value="mySpringGrid"/>
</bean>
</property>
</bean>
|
Or use default configuration:
1 | <bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean" scope="singleton"/>
|
5.3.2. Java Example
Here is how you may access this bean from code:
1 2 3 4 5 6 | AbstractApplicationContext ctx = new FileSystemXmlApplicationContext("/path/to/spring/file");
// Register Spring hook to destroy bean automatically.
ctx.registerShutdownHook();
Grid grid = (Grid)ctx.getBean("mySpringBean");
|
5.4. Examples
GridConfiguration may be defined in code:
1 2 3 4 5 6 7 8 9 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default values for grid node.
cfg.setGridName("mygrid");
...
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file (default Spring configuration file can be found in GRIDGAIN_HOME/config/default-spring.xml file):
1 2 3 4 5 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
...
<property name="gridName" value="mygrid"/>
...
</bean>
|
5.5. AOP Configuration
In order to use annotation based @Gridifydoc AOP-based grid-enabling the following AOP configuration needs to be in place depending on which AOP implementation you choose to use. Note that you only need to pick one AOP implementation.
5.5.1. JBoss AOP
Standalone application
Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:
-
The following JVM configuration must be present (make sure to replace com.foo.bar with your domain package):
-
-javaagent:[path to jboss-aop-jdk50-4.x.x.jar]
-
-Djboss.aop.class.path=[path to gridgain.jar]
-
-Djboss.aop.exclude=org,com -Djboss.aop.include=com.foo.bar
-
-
The following JARs should be in a classpath:
-
javassist-4.x.x.jar
-
jboss-aop-jdk50-4.x.x.jar
-
jboss-aspect-library-jdk50-4.x.x.jar
-
jboss-common-4.x.x.jar
-
trove-1.0.x.jar
-
JBoss AOP with JBoss AS
-
Install JBoss AOP deployer.
-
Remove the jboss-aop-jdk50.deployer directory of "server/your_server_name/deploy" in your JBoss AS
-
Download the latest stable version of JBoss AOP (1.5.5 GA)
-
Unzip it and make sure that all directories were unzipped case-sensitive
-
Copy the appropriate jboss-aop-jdk50.deployer directory from your JBoss AOP installation to your "server/your_server_name/deploy"
-
Edit jboss-aop-jdk50.deployer/jboss-service.xml, setting "EnableLoadTimeWeaving" with a true value, like follows:
1 2 3
<attribute name="EnableLoadtimeWeaving">true</attribute> <attribute name="Exclude">java,javax,org,com,net,sun,oracle,EDU,antlr</attribute> <attribute name="Include">com.foo.bar</attribute>
Make sure to replace com.foo.bar with your domain package. Also make sure to edit the exclude list if it does not have some packages that you would not like to weave.
-
Follow the instructions of the jboss-aop-jdk50.deployer/ReadMe.txt file
-
Copy pluggable-instrumentor.jar file (located in the lib-50 directory of your JBoss AOP installation) to the bin directory of your server
-
Edit your run.sh or run.bat to include -javaagent:pluggable-instrumentor.jar in the JAVA_OPTS
-
-
Deploy Gridgain as SAR.
-
Copy gridgain.sar directory from GRIDGAIN_HOME/config/jboss folder into your "server/your_server_name/deploy" folder.
-
Make sure to update classpath in gridgain.sar/META-INF/jboss-service.xml to point to all libs under GRIDGAIN_HOME and GRIDGAIN_HOME/libs.
-
|
|
JBoss AOP and JSP JBoss AOP CFLOW pointcut does not properly work JSP-compiled classes (it does not properly handle JSP
classes on the stack). The workaround is to include pre-compiled JSP classes into your WAR file. Tomcat
provides instructions on how to do that with JSPC here -
Web Application Compilation. |
5.5.2. AspectJ AOP
The following configuration needs to be applied to enable AspectJ byte code weaving:
-
JVM configuration should include: -javaagent:[GRIDGAIN_HOME]/libs/aspectjweaver-1.5.3.jar
-
Classpath should contain the [GRIDGAIN_HOME]/config/aop/aspectj folder.
5.5.3. Spring AOP
Spring AOP framework is based on dynamic proxy implementation and doesn’t require any specific runtime parameters for online weaving. All weaving is on-demand and should be performed by calling method GridifySpringEnhancer.enhance(Object) for the object that has method with Gridify annotation.
Note that since this method of weaving requires manual enhancing of participating classes, it is rather inconvenient in most cases, and AspectJ or JbossAOP are recommended over it. Spring AOP can be used in situations when code augmentation is undesired and cannot be used. It also allows for very fine grained control of what gets weaved.
BEA Weblogic AS
Weblogic application server does not support AspectJ and JBoss AOP officially and the only way to use AOP is a Spring AOP. One needs to enhance classes as described above using Spring AOP. See http://springide.org/blog/2006/05/24/implementing-jee-with-spring-and-weblogic for details.
6. Main Abstractions
This chapter will list some of the main concepts in GridGain that you need to understand to move forward. Most of them will be discussed in greater depth later on in the book - but it’s helpful to lay them out upfront so that you can follow up examples. This also gives you the bird view on GridGain architecture and API design.
|
|
Keep in mind that we don’t expect you to fully understand each topic below just yet - all of them will be discussed in-depth much later in the book. In fact, you can skim this chapter quickly - but we recommend at least that. |
GridGain has several key abstractions that are essential for understanding pretty much everything else in GridGain. We’ll begin with them first.
6.1. GridNode Interface
GridNodedoc interface defines a logical grid node in the network topology. Note that a physical node (like a computer on the network) can have multiple logical grid nodes running on it. In fact, a single JVM can run multiple logical grid nodes - note that GridGain is the only software in the world allowing this unique capability.
GridNode interface has very concise API and deals only with a notion of a logical network endpoint, a node, in the topology: it has globally unique ID, node metrics, set of static attributes provided by the user and few other parameters.
|
|
GridNode GridNode has globally unique ID, set of static attributes provided by the user, node metrics and
few other parameters. |
The unique characteristic of GridGain is that it uses Peer-To-Peer (P2P) topology meaning that all nodes in GridGain are equal. There is no master or server nodes, and there are no worker or client nodes either. All nodes are equal from GridGain’s point of view - yet all these and any other roles can be assigned logically to the nodes.
This unique design gives GridGain tremendous flexibility: not only you are not limited to the master-worker mold - you can define any application specific roles and assign them to the nodes dynamically. More over, since these roles are logical, they can change and "migrate" from node to node as topology changes or based on your application logic.
GridNode interface is used primarily by internal kernal code and by discovery and communication SPI implementations and rarely used directly. GridRichNodedoc interface, its rich counterpart, is what used instead for majority of GridGain operations. More on this below.
6.2. Local Grid Node
Local grid node is an instance of GridNode interface that is instantiated in the local JVM runtime for a specific grid. In general, JVM process that runs GridGain runtime can have zero, one or more local grid nodes, but only one local node per specific grid’s topology.
6.3. Grid Topology
As the logical extension of the network endpoint - a grid node as defined above - we use the term topology throughout the GridGain documentation to reference a set of all logical grid nodes where each node "knows" every other nodes (in other words, topology is a fully connected graph of all grid nodes including the local node). We often refer to such topology as simply a grid.
|
|
Topology is Associative Note that the key characteristic of the topology is its associative property, i.e. the fact
that each node "knows" every other node. |
Depending on configured discovery SPI implementation (discussed later) the topology can be guaranteed to be consistent on all nodes at any point of time or be eventually consistent only. The eventual consistency means that all nodes will eventually get into fully consistent view but there a short time window where nodes can have a different view on the global topology. The guaranteed consistency is expensive to implement and it is optional in GridGain.
Note, however, that for data grid, for example, the configured SPI must provide consistent topology (i.e. support guaranteed discovery discussed later).
It is important to note that a single GridGain runtime can support any number of topologies or grids in the same time. Nodes from one topology have no knowledge about the nodes in other topologies. In such cases, single GridGain runtime (JVM process) will have several local grid nodes where each local node would belong to a different grid. Again, there is an important distinction between GridGain runtime (a JVM process) and a logical grid node running inside of that runtime.
|
|
GridGain Runtime vs. Grid Node There is an important distinction between GridGain runtime (a JVM process) and a logical grid
node running inside of that runtime. |
Note that we often refer to virtual sub-grid or sub-topology which is essentially just a subset of grid nodes from one topology. More on all that later.
6.4. GridProjection Interface
One of the major addition in GridGain 3.0 was introduction of a grid projection (and corresponding GridProjectiondoc interface. The important observation that led to this addition was the fact there is a large set of GridGain operations that can be defined uniformly on any arbitrary set of grid nodes.
Put it differently, each such operation can be performed on zero, one, two or any other number of grid nodes. For example, you can send a message to one, or more nodes in the grid. Just as you can listen for messages from one, two or any other number of nodes. And so on. To use functional programming terminology - these are the monadic operations defined on a set of grid nodes (which correspondingly makes GridProjection a monad).
|
|
GridProjection is a Monad GridProjection exposes monadic set of operations defined on an arbitrary non-empty set of
grid nodes. |
To logically group such operations the GridProjection interface is introduced and it defines all major operations in the GridGain that can be performed on a arbitrary set of nodes. You can think about grid projection as a specific view on topology. Projection can be static or dynamic and there are many ways how a projection can be defined.
As you will see throughout this book the elements of functional programming or functional API design are central to GridGain. You’ll discover that even when working with Java APIs you are dealing with functional constructs most of the time - even though Java is not a functional language to being with! This is one the unique sides of GridGain and it leads to extremely elegant and simple to use APIs.
We are going to cover projections and functional programming framework in Java in much greater details in subsequent chapters.
6.5. Rich Interfaces
Once we introduced grid projection it is pretty logical to extend definition of a grid node as a grid projection with just that one node it. Similarly, one can extend grid cloud definition as a grid projection that contains all nodes belonging to that specific physical cloud. And finally, it is only logical to provide a conveniently defined global projection that contains all the nodes in the topology.
This is exactly what GridRichNodedoc, and Griddoc interfaces do. They all extend GridProjection interface and add all necessary additional operations that are specific to a grid node, grid cloud or a global topology.
The idea of rich interfaces is central in Scala and Ruby libraries, for example. By having both thin and rich interfaces (GridNode and GridRichNode) we can satisfy both types of interface usages:
-
thin interface that needs to be implemented by the end user (and therefore should be as simple as possible), and
-
rich interface that is actually used by the end user (and therefore should be as rich as possible).
|
|
Rich vs. Thin Interfaces Thin interface that needs to be implemented by the end user and therefore should be as simple
as possible. Rich interface that is actually used by the end user and therefore should be as
rich as possible. |
Historically, Grid interface - being a global all-inclusive projection, has an additional special purpose. Grid interface acts as a main entry point for entire GridGain functionality. In fact, most of the operations you perform on GridGain originate on Grid interface. That is where you can obtain the instance of data grid, get an instance of rich cloud interface for a specific cloud, create and manage grid projections, manually deploy grid tasks and perform multitude of other operations.
To get an instance of Grid interface you need to use GridFactory.
6.6. GridFactory Class
GridFactorydoc class is a life-cycle factory for Grid instances. Its purpose is to provide various ways to start and stop instances of Grid interface. Note that Grid interface - being the main entry point for GridGain APIs - has a strict life-cycle and state machine that is controlled by GridFactory. As noted before, single GridGain runtime (JVM process) can have zero or more Grid instances each providing a local view on a different grid.
The usual way to work with GridGain is to use GridFactory class to start Grid instance using specific (or default) configuration file (usually at the beginning of your application). Starting Grid instance means, among other things, starting a local grid node and have it join the topology. Once you have started Grid instance you can use any APIs provided by GridGain. When GridGain is no longer needed, you use GridFactory to stop the Grid instance and its node will leave the topology (usually at the end of your application).
6.7. GridCache Interface
GridCachedoc interface represent the main API entry point for data grid functionality. GridCache instance always refers to a single named cache. You can configure as many named caches as you like. You receive the GridCache instance from Grid instance (as anything else in GridGain). GridCache is a rich interface and represents global cache projection (see below).
6.8. GridCacheProjection Interface
GridCacheProjectiondoc interface is analogous to GridProjection but it defines a cache projection over specified set of key-value pairs. In fact, GridCache interface extends GridCacheProjection and simply represents a global cache projection, i.e. the projection over all key-value pairs in this cache.
Cache projections are extremely powerful technique in GridGain’s data grid. It provides a monadic set of operations that is defined on any arbitrary set of:
-
keys,
-
values, or
-
key-value pairs
giving data grid a distinct functional flavor and providing consistent API design between compute and data grids.
|
|
Compute and In-Memory Data Grid Design Unification This logical and design unification between compute and data grids around functional monadic
concept is one of the unique characteristics of GridGain architecture. |
6.9. MapReduce
MapReduce is a relatively new name for very old concept. In a strict terms the term MapReduce refers to patented algorithm introduced by Google in their internal distributed data processing systems and closely mirrored by Hadoop project developed by a competing Yahoo!.
We (as well as distributed programming community in general) tend to use term MapReduce in more wider sense since it was extensively publicized and we refer to any divide-and-concur design strategies or traditional parallel computing as MapReduce. In fact, if you have a long running task, you can split this task into multiple sub-tasks, execute these tasks in parallel, aggregate their results back and get you final result in a fraction of time. That’s a classic parallel programming, or compute grid - and to avoid myriads of names we call it MapReduce too.
|
|
Google and MapReduce It is important, to note, however, that specific implementation that we chose in GridGain has
relatively little, if anything, to do with algorithm used by Google (or Hadoop). Not only
Google’s algorithm is patented, but it is also very specific to Google’s needs and rarely
applicable outside of extreme big-data use cases. |
Hadoop provides one implementation of MapReduce that is closely matching Google definition as an Apache project. Note that Google granted the license to Hadoop to use patented Google algorithm.
6.10. Streaming MapReduce
Streaming MapReduce is a less-defined term but often refers to a type of processing similar to MapReduce (i.e. tasks gets split and reduced) but with input data is not finite in general. Typical example is a search in a live video feed: the obvious problem is that you can’t load the entire feed first and then partition it into small parts to be processed in parallel (like you would do in a traditional MapReduce); you need to somehow map and reduce the incoming data as it comes and in the same time keep providing failover, topology resolution, collision resolution, back pressure control, and all other necessary services.
GridGain provide several unique ways of how this type of processing can be implemented.
6.11. Real-Time Processing
GridGain is all about processing large data sets (a.k.a. BigData) in real-time. But what real-time are we talking about?
In GridGain - we are talking about perceptual real-time, or a software real-time (Java Virtual Machine doesn’t technically support hardware real-time processing). Perceptual real-time is defined by a maximum response time that a typical user will wait for the task he or she expects to be "instant" before cancelling the task. For example, when a typical user clicks "Add to Basket" button on the website anything beyond couple of seconds will probably make that user to click "Back" or otherwise cancel the task. For a online trading application the delay of few seconds on submitting the order will indicate something wrong with the system. Portfolio management application that takes 10 seconds to open last minute moving average chart is practically unusable. And so on…
6.12. Closures & Predicates
We mention closures and predicates here only because we provide their full implementation on Java side. Unlike Scala or Groovy, where closures (functions) are part of the language, in Java they are not - and we had to develop an entire state of the art distributed functional framework for Java that is included with GridGain.
Closure is a block of code that encloses its body and any outside variables used inside of it as a function object. You can then pass such function object anywhere you can pass a variable and execute it. Predicate is a special type of closure that simply returns boolean value.
Scala as a hybrid OOP and FP language offers natural advantage over Java since it provides native in-language support for closures which enables much more concise and elegant APIs provided by GridGain’s Scalar DSL. However, with GridGain’s functional framework we brought Java functional usage as close to Scala as possible.
Below is a simple example that broadcasts and prints string on all nodes in the topology. Just compare how close Java, Groovy and Scala implementations are:
1 2 3 4 5 6 7 8 | ...
object Test {
// Broadcast "Howdy!" string to all nodes.
def main(args: Array[String]) = scalar {
grid$ *< (BROADCAST, () => println("Howdy!")
}
}
...
|
1 2 3 4 5 6 7 8 9 10 11 12 | ...
@Typed
@Use(GroverProjectionCategory)
class Test {
// Broadcast "Howdy!" string to all nodes.
static void main(String[] args) {
grover {
Grid g -> g.run(BROADCAST) { println("Howdy!") }
}
}
}
...
|
1 2 3 4 5 6 7 8 9 10 11 12 | ...
public class Test {
// Broadcast "Howdy!" string to all nodes.
public void main(String[] args) {
G.in(null, new CIX1() {
@Override public void applyx(Grid g) throws GridException {
g.run(BROADCAST, F.println("Howdy!"));
}
};
}
}
...
|
Closures and predicates used extensively in GridGain APIs and at the core of our design. You will discover plenty of examples later in the book of how closures and predicates - in Java, Groovy and Scala - used throughout the GridGain.
6.13. Type Aliases & Typedefs
One of the unusual features of our Java side APIs is introduction of type aliases (also known as typedefs) in GridGain 3.0. With introduction of functional design in GridGain 3.0 we have quickly discovered that Java APIs are just way too "chatty" due to lack of any type inference by the Java compiler. The only way to combat that problem in Java is to introduce type alias - a subclass with a shorter name.
Obviously, it works best (or at all) for static, factory-type classes but it also works for object instantiations but unfortunately you can’t use type aliases in method signature since they are different types. Such as live in a Java world…
|
|
Aliases Applicability Due to Java-based implementation type aliases or typedefs are used only for static,
factory-type classes. |
We’ve introduced aliases only for key GridGain types and there’s only about a dozen of aliases in GridGain APIs. It’s pretty easy to memorize the key ones that are used most frequently. Usage of aliases is, of course, optional as you can always use full (original) names of the types. But we are pretty sure that you’ll find aliases on Java side useful to make your code more readable.
Here’s good example demonstrating how aliases can Java code slightly more readable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | GridFunc.copy(res, goods,
GridFunc.<Item>and(
GridFunc.<Item>notNull(),
GridFunc.<Item>or(
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
GridFunc.forEach(res, GridFunc.<Item>println());
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | F.copy(res, goods,
F.<Item>and(
F.<Item>notNull(),
F.<Item>or(
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
F.forEach(res, F.<Item>println());
|
6.14. Grid Task and Job
Grid task and job are the main abstractions in the compute grid. As you recall the compute grid is about parallelization the processing, i.e. splitting a long running task into multiple sub-tasks, executing those sub-tasks in parallel and aggregating their results into one final result.
GridTaskdoc defines the descriptor for the overall task to be processed and grid job defines a sub-tasks, i.e. the piece of code that will travel to remote nodes for execution. Essentially a grid task defines mapping and reducing logic, while GridJobdoc is very similar to java.util.Callable interface by defining a simple executable body.
|
|
GridTask and GridJob GridTask defines overall task descriptor as well as mapping and reducing logic. GridJob
defines units of work that tasks get split into and that travel to remote nodes for execution. The
result of their execution will be reduced by the task into the final result. |
Note that anything that gets executed on the compute grid should be defined as a grid task. Closures and AOP-based executions get converted to grid task automatically when needed.
6.15. Service Provider Interface (SPI)
Service Provider Interface (SPI) is at core of GridGain architecture as it provide pluggable modularization to GridGain. SPI concept is simply a component interface with multiple pluggable implementations that all share unified life cycle management. There are two key benefits in this approach:
-
Rest of GridGain (including all user application code) only uses the SPI interface and is totally independent from its specific implementation.
-
User can provide its own pluggable implementation for SPIs and therefor not limited to the one provided by GridGain itself.
The example of SPI is communication subsystem. It has simple interface GridCommunicationSPIdoc and half a dozen implementations that GridGain provides out of the box. These implementations can be set in grid configuration GridConfiguration (and TCP/IP-based implementation is always set by default so that you don’t have to set anything to get GridGain working right away).
|
|
GridGain SPI - Service Provider Implementation Experienced reader can quickly notice that SPI is very similar to OSGI. Although we looked
carefully at OSGI number of years ago we quickly concluded that we needed more custom
functionality that OSGI provided at the time. In the same time our SPI architecture and OSGI
share the same goals of componentization and modularity. |
GridGain is "sliced" into dozen of different SPIs for all major subsystems and each such subsystem can be fully replaced by user’s specific implementation - an extremely powerful feature of GridGain.
7. Functional Programming Framework
Introduction of Functional Programming (FP) into Java-based GridGain has its own unique story in GridGain that is worth repeating here.
It was anything but a straightforward decision… In early 2008 when we released GridGain 2.0 we’ve started looking for a new ways to simplify the usability of GridGain. We’ve had a pretty good story with AOP-based grid enabling and a direct API support for MapReduce type of processing - in fact we’ve been way ahead of everyone else in this department. But we felt there’s still lots of plumbing exposed for many use cases where such exposition was clearly unnecessary.
The obvious thought for us was to look at Domain Specific Language (DSL) route. We quickly realized that Java-based DSL is simply out of question (it would be just another set of Java APIs not much different from we had already). XML-based DSL (or any type of external DSL) was considered a non-started even in a hay days of DSLs of 2006-2007.
So, we started looking at other JVM-based languages that would be much more appropriate for DSL development yet let us reuse GridGain Java-based APIs. During surprisingly short evaluation period (which we’ll chronicle in Scala section later on) we quickly and decisively settled on Scala - relatively new than language that so powerfully combined OOP and FP into one cohesive and expressive language.
As we dived into Scala-based DSL development with a renewed energy we quickly realized that in order to provide truly powerful DSL in Scala (utilizing Scala’s functional core including partial functions, closures, etc.) we essentially had to re-implement most of the main APIs in Scala, i.e. duplicate GridGain in Scala. That was a pretty rude awakening for us as it was simply a no-go to have GridGain implemented in two parallel tracks: one in Java and one in Scala.
And that’s where functional story for Java begins. After some research on our part we noticed that if we could make Java side APIs functional in their design - we could largely reduce the Scala-based DSL to a collection of implicit conversions from functional Java parts to Scala parts (and back). That would also allow to have all implementation in Java (where it is originally was) and keep Scala APIs as a layer on top without any duplication of code what-so-ever.
We’ve set off to develop a first truly distributed Functional Programming framework for Java.
After a few false starts we’ve got a first working prototype (that didn’t suck) and started the refactoring process of our Java APIs into functional mold. What we started noticing along the way is that new APIs (newly added or refactored) were becoming much more powerful and elegant when used with functional constructs such as predicate-based node filters or closure-based executions. In about that time we also came up with grid and cache projections that truly revolutionized the GridGain usability. Many implementations became shorter and yet more expressive when we started using our GridFunc class as well as newly introduced typedefs. All these positive effects on our internal software development solidified our resolve to provide the same capabilities to the users of GridGain.
|
|
Scala Leads to FP in Java
All in all, the introduction of Scala support and Scala DSL into GridGain led us to develop one of the most comprehensive Functional Programming frameworks in Java that at the core of most of the functionality in GridGain. |
What is even more interesting (and exciting for us) is that FP focus on Java side made the development of Grover, our Groovy++ based DSL for GridGain, simple if not downright trivial. It took just a week to release first beta version of it and it was already mighty useful for large Groovy community.
More on that later…
7.1. Type Aliases and Typedefs
Type aliases or typedefs used in programming languages to give shorter name to existing type name to make the code that is using it more readable and easer to understand.
Many languages provide built-in support for type aliases or typedefs.
C-based languages (C/C++/Objective-C) provide direct support for it:
typedef enum {A, B, C} myEnum;
typedef int myInt, yoursInt;
Scala also provides excellent built-in support for type alias that goes beyond C-based capabilities. You can declare a type alias right during importing the original type:
1 | import foo.bar.{Original => O} // 'O' becomes alias of 'Original'
|
And you can also declare the type alias in your code much the same as declaring method or a field (and similar to C/C++):
1 2 | type Call[R] = () => R // A shorthand for function
type OneWayTicket = () => Unit // Another shorthand for function
|
You often use structural types in Scala with type aliases:
type T = { def foo: Unit }
declares T as an alias for any type that has method foo with return type Unit.
Java, unfortunately, does not provide any support for type aliases… Yet Java needs them more than any language above due to its syntactic bloat and lack of any reasonable type inference. In fact, look at this "typical" Java code:
1 2 | private HashMap<Collection<String>, Set<Integer>> map =
new HashMap<Collection<String>, Set<Integer>>();
|
Since there is no type inference you have to repeat bloated HashMap definition twice in this definition for no reason what-so-ever - it only makes code look more busy and less readable. And if this type of HashMap is used frequently - and there’s no way to shorthand it - you have to repeat this over and over again in every place where it is used.
|
|
Java needs typedefs more than any language above due to its syntactic bloat and lack of any reasonable type inference. |
Surprisingly enough, this shortcoming of Java is often sighted as one of the major reason the Java code "feels" bloated and unnecessary verbose. This gets even more pronounced when you start using Scala that, like Java, is fully statically typed but removes most, if not all, bloat and unnecessary repetition from the code.
With introduction of Functional Programming in GridGain 3.0 (explained in the following chapter) we were faced with this very problem in Java APIs: we had plenty of new interfaces and classes that were used literally everywhere in our code and it was becoming unwieldy in many places since many of them require parametrization and code was becoming simply ridiculous in some place… Needless to say that users of GridGain APIs would have been faced with the same problems.
To solve this problem (somewhat) we have introduced typedefs to our Java APIs (Scala APIs naturally use Scala language type aliases).
|
|
Typedef Essentially, a typedef is simply a subclass with a shorter name - as there is no other
way to do that in Java. |
In package org.gridgain.grid.typedefdoc we have few dozens of typedefs defined as sub-classes with short one-two letter names for various frequently used types in GridGain. The following table shows all typedefs shipped with GridGain:
| Typedef or Type Alias | Original Type |
|---|---|
C1<E1,R>doc |
org.gridgain.grid.lang.GridClosure<E1,R>doc |
C2<E1,E2,R>doc |
org.gridgain.grid.lang.GridClosure2<E1,E2,R>doc |
C3<E1,E2,E3,R>doc |
org.gridgain.grid.lang.GridClosure3<E1,E2,E3,R>doc |
CAdoc |
org.gridgain.grid.lang.GridAbsClosuredoc |
CAXdoc |
org.gridgain.grid.lang.GridAbsClosureXdoc |
CI1<T>doc |
org.gridgain.grid.lang.GridInClosure<T>doc |
CI2<E1,E2>doc |
org.gridgain.grid.lang.GridInClosure2<E1,E2>doc |
CI3<E1,E2,E3>doc |
org.gridgain.grid.lang.GridInClosure3<E1,E2,E3>doc |
CIX1<T>doc |
org.gridgain.grid.lang.GridInClosureX<T>doc |
CIX2<E1,E2>doc |
org.gridgain.grid.lang.GridInClosure2X<E1,E2>doc |
CIX3<E1,E2,E3>doc |
org.gridgain.grid.lang.GridInClosure3X<E1,E2,E3>doc |
CO<T>doc |
org.gridgain.grid.lang.GridOutClosure<T>doc |
COX<T>doc |
org.gridgain.grid.lang.GridOutClosureX<T>doc |
CX1<E1,R>doc |
org.gridgain.grid.lang.GridClosureX<E1,R>doc |
CX2<E1,E2,R>doc |
org.gridgain.grid.lang.GridClosure2X<E1,E2,R>doc |
CX3<E1,E2,E3,R>doc |
org.gridgain.grid.lang.GridClosure3X<E1,E2,E3,R>doc |
Fdoc |
org.gridgain.grid.lang.GridLangdoc |
Gdoc |
org.gridgain.grid.GridFactorydoc |
P1<E1>doc |
org.gridgain.grid.lang.GridPredicate<E1>doc |
P2<T1,T2>doc |
org.gridgain.grid.lang.GridPredicate2<T1,T2>doc |
P3<T1,T2,T3>doc |
org.gridgain.grid.lang.GridPredicate3<T1,T2,T3>doc |
PAdoc |
org.gridgain.grid.lang.GridAbsPredicatedoc |
PCE<K,V>doc |
org.gridgain.grid.lang.GridPredicate<GridCacheEntry<K, V>> |
PEdoc |
org.gridgain.grid.lang.GridPredicate<GridEvent> |
PKV<K,V>doc |
org.gridgain.grid.lang.GridPredicate2<K, V> |
PNdoc |
org.gridgain.grid.lang.GridPredicate<GridRichNode> |
PX1<E1>doc |
org.gridgain.grid.lang.GridPredicateX<E1>doc |
PX2<T1,T2>doc |
org.gridgain.grid.lang.GridPredicate2Xdoc |
PX3<T1,T2,T3>doc |
org.gridgain.grid.lang.GridPredicate3Xdoc |
R1<E1,R>doc |
org.gridgain.grid.lang.GridReducerdoc |
R2<E1,E2,R>doc |
org.gridgain.grid.lang.GridReducer2doc |
R3<E1,E2,E3,R>doc |
org.gridgain.grid.lang.GridReducer3doc |
RX1<E1,R>doc |
org.gridgain.grid.lang.GridReducerXdoc |
RX2<E1,E2,R>doc |
org.gridgain.grid.lang.GridReducer2Xdoc |
RX3<E1,E2,E3,Rdoc |
org.gridgain.grid.lang.GridReducer3Xdoc |
As you can see typedefs defined primarily for functional classes (tuples, closures, and predicates) as well as for a few factory classes like GridFactorydoc and GridFuncdoc. Here’s a short sub-list of the most frequently used typedefs in GridGain:
| Typedef or Type Alias | Original Type |
|---|---|
C1<E1,R>doc |
org.gridgain.grid.lang.GridClosure<E1,R>doc |
CAdoc |
org.gridgain.grid.lang.GridAbsClosuredoc |
CO<T>doc |
org.gridgain.grid.lang.GridOutClosure<T>doc |
Fdoc |
org.gridgain.grid.lang.GridLangdoc |
P1<E1>doc |
org.gridgain.grid.lang.GridPredicate<E1>doc |
PAdoc |
org.gridgain.grid.lang.GridAbsPredicatedoc |
PNdoc |
org.gridgain.grid.lang.GridPredicate<GridRichNode> |
Here’s a code snipped from GridFunctionalCopyExample example that is shipped with GridGain. First version does not use typedefs and uses full names of the types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | GridFunc.copy(res, goods,
GridFunc.<Item>and(
GridFunc.<Item>notNull(),
GridFunc.<Item>or(
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
GridFunc.forEach(res, GridFunc.<Item>println());
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | F.copy(res, goods,
F.<Item>and(
F.<Item>notNull(),
F.<Item>or(
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
F.forEach(res, F.<Item>println());
|
I would argue that the second version gains more readability and easier to understand since we don’t have to repeat ad nauseum GridFunc and GridPredicate in every line.
Note also that we have couple of typedefs that shorten parameterized types that allows for greater brevity and more concise code:
| Typedef or Type Alias | Original Type |
|---|---|
PCE<K,V>doc |
org.gridgain.grid.lang.GridPredicate<GridCacheEntry<K, V>> |
PEdoc |
org.gridgain.grid.lang.GridPredicate<GridEvent> |
PKV<K,V>doc |
org.gridgain.grid.lang.GridPredicate2<K, V> |
PNdoc |
org.gridgain.grid.lang.GridPredicate<GridRichNode> |
|
|
Limitation
Now, this approach obviously has limitation:
|
|
|
Scala You can freely use Java-based typedefs in Scala code - but we suggest to use native
type alias support provided by Scala |
7.1.1. Typedefs vs. Factory Methods
Now, you may ask why not use factory methods, a standard Java idiom, instead? GridGain actually provides plenty of factory methods in GridFuncdoc class (that itself has F typedef). But factory methods often tend to be more verbose and sometime hide the "creation of new instance" context.
1 | Foobar v = FoobarFactory.newFoobar(...);
|
or
1 | Foobar v = new T(...); // 'T' is a typedef for 'Foobar'
|
In our experience working with GridGain source code we’ve found that typedefs generally provide for the most terse code without loosing context or readability.
7.1.2. Where To Use Typedefs
The answer here is simple - everywhere unless you lose in readability of your code.
|
|
Do Not Trade In Readability We strongly believe that you should never trade in few saved characters for poorer code readability. |
Once you get familiar with the some of the most frequently used typedefs - you will start using them freely and in most situation they will improve your code - make it less bloated and concentrate reader’s attention on the actual business logic and away from unnecessary repetitive declarations.
8. GridGain Basics
In this pretty long chapter we’ll cover all basic functionality available in GridGain apart from the "big two" - compute grids and data grids - which will be covered in subsequent individual chapters. Both big subsystems are fundamentally based on the functionality explained in this chapter and therefore the following material is pretty important.
|
|
What’s more interesting is that fact that some GridGain-based applications don’t even use any of the two main technologies we have - but utilize, for example, actor-based message passing, distributed functional programming, zero deployment or event-based processing provided by GridGain. |
8.1. Logging
GridGain provides pluggable logging capability by allowing the user to specify his own logging framework. This is especially convenient when GridGain runs inside of the hosting environment such as servlet container or application server.
In such case, GridGain can be easily configured to route all its logging through host’s logging framework eliminating the need to have multiple log file locations. This also dramatically simplifies the debugging since multiple log file don’t have to be line-by-line synchronized.
To provide this pluggability GridGain relies on its own interface GridLoggerdoc that provides absolute minimal API for logging. While GridGain uses this interface throughout entire product it further provides out-of-the-box complete implementations for this interface using the following popular logging frameworks:
Users, of course, are free to provide their own implementations and many often do for integration with existing log analysis or health monitoring solutions.
8.1.1. Configuration
GridGain logger could be configured either from code by modifying GridConfiguration during start of GridGain or via Spring XML. Following examples demonstrate both ways for Log4J and JCL loggers:
1 2 3 4 5 6 7 | GridConfiguration cfg = new GridConfigurationAdapter();
...
// Log4J logger.
URL xml = U.resolveGridGainUrl("modules/tests/config/log4j-test.xml");
GridLogger log = new GridLog4jLogger(xml);
...
cfg.setGridLogger(log);
|
1 2 3 4 5 6 7 8 9 10 11 | ...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.jcl.GridJclLogger">
<constructor-arg type="org.apache.commons.logging.Log">
<bean class="org.apache.commons.logging.impl.Log4JLogger">
<constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
</bean>
</constructor-arg>
</bean>
</property>
...
|
Configuring Java Logging
Here is an example of configuring Java logger in GridGain configuration Spring file to work over Log4J implementation. Note that we use the same configuration file as we provide by default:
1 2 3 4 5 6 7 8 9 10 11 | ...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.java.GridJavaLogger">
<constructor-arg type="java.util.logging.Logger">
<bean class="java.util.logging.Logger">
<constructor-arg type="java.lang.String" value="global"/>
</bean>
</constructor-arg>
</bean>
</property>
...
|
or
1 2 3 4 5 | ...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.java.GridJavaLogger"/>
</property>
...
|
And the same configuration if you’d like to configure GridGain in your code:
1 2 3 4 5 | GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJavaLogger(Logger.global);
...
cfg.setGridLogger(log);
|
or which is actually the same:
1 2 3 4 5 | GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJavaLogger();
...
cfg.setGridLogger(log);
|
Configuring Log4j
Here is a typical example of configuring log4j logger in GridGain configuration file:
1 2 3 4 5 | <property name="gridLogger">
<bean class="org.gridgain.grid.logger.log4j.GridLog4jLogger">
<constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
</bean>
</property>
|
and from your code:
1 2 3 4 5 6 | GridConfiguration cfg = new GridConfigurationAdapter();
...
URL xml = U.resolveGridGainUrl("modules/tests/config/log4j-test.xml");
GridLogger log = new GridLog4jLogger(xml);
...
cfg.setGridLogger(log);
|
Configuring JBoss logging
Information about configuring JBoss logging with GridGain can be found at http://docs.jboss.org/process-guide/en/html/logging.html.
Configuring Tomcat logging
Please refer to http://tomcat.apache.org/tomcat-6.0-doc/logging.html for more information on how to configure GridGain with Tomcat logging.
Configuring JCL
This logger wraps any JCL - Jakarta Commons Logging loggers. Implementation simply delegates to underlying JCL logger. This logger should be used by loaders that have JCL-based internal logging (e.g., Websphere).
Here is an example of configuring JCL logger in GridGain configuration Spring file to work over Log4J implementation. Note that we use the same configuration file as we provide by default:
1 2 3 4 5 6 7 8 9 10 11 | ...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.jcl.GridJclLogger">
<constructor-arg type="org.apache.commons.logging.Log">
<bean class="org.apache.commons.logging.impl.Log4JLogger">
<constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
</bean>
</constructor-arg>
</bean>
</property>
...
|
If you are using system properties to configure JCL logger use following configuration:
1 2 3 4 5 | ...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.jcl.GridJclLogger"/>
</property>
...
|
And the same configuration if you’d like to configure GridGain in your code:
1 2 3 4 5 | GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJclLogger(new Log4JLogger("config/default-log4j.xml"));
...
cfg.setGridLogger(log);
|
or following for the configuration by means of system properties:
1 2 3 4 5 | GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJclLogger();
...
cfg.setGridLogger(log);
|
Configuring SLF4J
This logger should be used by hosts that have slf4j-based logging.
Here is an example of configuring SLF4J logger in GridGain configuration Spring file:
1 2 3 | <property name="gridLogger">
<bean class="org.gridgain.grid.logger.slf4j.GridSlf4jLogger"/>
</property>
|
8.1.2. Injection vs. Instantiation
Instance of GridLogger interface can be obtain at any point via Grid.log() method. However, when logger is needed in grid task and/or jobs it is preferable to use resource injection via @GridLoggerResourcedoc annotation that annotates a field or a setter method for injection of GridLogger instance.
Logger can be injected into instances of following classes:
-
GridTask
-
GridJob
-
GridSpi (and its all implementations)
-
GridLifecycleBean
-
Any object annotated with @GridUserResource annotation
Here is how injection would typically happen:
1 2 3 4 5 6 | public class MyGridJob implements GridJob {
...
@GridLoggerResource
private GridLogger log;
...
}
|
or
1 2 3 4 5 6 7 8 9 10 | public class MyGridJob implements GridJob {
...
private GridLogger log;
...
@GridLoggerResource
public void setGridLogger(GridLogger log) {
this.log = log;
}
...
}
|
8.1.3. Quiet Mode
GridGain 3.0 introduced a quiet logging mode. Essentially, this mode suppresses most of the INFO and all DEBUG logging and provides very concise logging output. This mode is very useful for examples and demonstration as well as for everyday development where full output of INFO or DEBUG is not necessary.
By default starting with version 3.0 GridGain starts in quite mode suppressing INFO and DEBUG log output. If system property GRIDGAIN_QUIET is set to false than GridGain will operate in normal un-suppressed logging mode (with whatever logging back-end is configured). Note that all output in quiet mode is done through standard output (STDOUT).
Note that GridGain’s standard startup scripts $GRIDGAIN_HOME/bin/ggstart.{sh|bat} starts by default in quiet mode. Both scripts accept -v arguments to turn off quiet mode.
8.2. Loaders
Grid loaders are used to start grid in different environments. Loaders provide basic boilerplate code for starting GridGain in various environments. For example, when starting within application servers such as JBoss, Weblogic or Websphere, provided loaders will configure GridGain to use "native" logging, JMX facility, discovery and execution services (JSR-237) which makes GridGain basically blend into hosting environment. Loaders do not need to implement any interface and their sole responsibility is to configure and start grid.
8.2.1. Command Line Loader
Command line loader located in org.gridgain.grid.loaders.cmdline package.
Command line loader is used to start grid from a command line script. Grid comes with ggstart.{sh|bat} startup script located in $GRIDGAIN_HOME/bin folder. By default this script will use configuration defined in $GRIDGAIN_HOME/config/default-spring.xml. This configuration will pick default configuration for all grid internal components and SPI’s which is sufficient for running examples and doing your own development and testing.
If you wish to provide your own configuration file, simply pass its path as parameter to the script.
ggstart.sh C:\myfolder\mygrid.xml
To stop grid, simply press CTRL-C which will initiate GridGain stop routine.
|
|
Script Startup Note that in addition to starting grid nodes on separate physical machines, GridGain supports
starting multiple grid nodes on the same machine as well as in the same VM. The only requirement
for default configuration is that IP-Multicast is supported. |
|
|
Custom Jars Starting from 2.1.0 you can add your libraries to the class path without changing startup scripts.
Just put them into the $GRIDGAIN_HOME/libs/ext directory and GridGain will pick them up automatically. |
|
|
GRIDGAIN_HOME Environment Variable If you get the following error: Exception in thread "main" java.lang.NoClassDefFoundError:
org/gridgain/grid/loaders/cmdline/GridCommandLineLoader, then your $GRIDGAIN_HOME environment variable
is not set or is set incorrectly. Please set $GRIDGAIN_HOME environment variable to your GridGain
installation folder. |
8.2.2. GlassFish Loader
GlassFish loader located in org.gridgain.grid.loaders.glassfish package.
GlassFish loader is used to start GridGain within GlassFish application server. GridGain loader implemented as GlassFish life-cycle listener module. GlassFish loader should be used to provide tight integration between GridGain and GlassFish AS. Current loader implementation works on both GlassFish v1 and GlassFish v2 servers.
The following steps should be taken to configure this loader:
-
Add GridGain libraries in GlassFish common loader. See GlassFish Class Loaders
-
Create life-cycle listener module. Use command line or administration GUI:
asadmin> create-lifecycle-module --user admin --passwordfile ../adminpassword.txt --classname "org.gridgain.grid.loaders.glassfish.GridGlassfishLoader" --property cfgFilePath="config/default-spring.xml" GridGain
For more information consult GlassFish Project - Documentation Home Page.
Note that GlassFish is not shipped with GridGain. If you don’t have GlassFish, you need to download it separately. See https://glassfish.dev.java.net for more information.
8.2.3. Tomcat Loader
Tomcat loader located in org.gridgain.grid.loaders.tomcat package.
Tomcat loader is used to start GridGain within Tomcat server. GridGain loader implemented as Tomcat LifecycleListener. Tomcat loader should be used to provide tight integration between GridGain and Tomcat web container (logging, MBean server).
The following steps should be taken to configure this loader:
-
Add GridGain libraries in Tomcat common loader. Add in file $TOMCAT_HOME/conf/catalina.properties for property common.loader the following $GRIDGAIN_HOME/gridgain.jar,$GRIDGAIN_HOME/libs/*.jar (replace $GRIDGAIN_HOME with absolute path).
-
Add GridGain LifeCycle Listener in $TOMCAT_HOME/conf/server.xml.
1 2 3 | <!-- GridGain loader -->
<Listener className="org.gridgain.grid.loaders.tomcat.GridTomcatLoader"
configurationFile="config/default-spring.xml"/>
|
Note that Tomcat is not shipped with GridGain. If you don’t have Tomcat, you need to download it separately. See http://tomcat.apache.org for more information.
8.2.4. JBoss Loader
JBoss loader located in org.gridgain.grid.loader.jboss package.
JBoss loader is used to start GridGain within JBoss as a JBoss service. Note that jboss-service.xml has a configuration parameter pointing to Spring XML configuration. At startup, JBoss loader will look for the Spring configuration XML file specified in jboss-service.xml.
GridGain ships with pre-built SAR directory. SAR directory located in $GRIDGAIN_HOME/config/jboss folder. You can simply deploy GridGain into JBoss into 2 steps:
-
Change the codebase in jboss-service.xml in META-INF sub-folder to point to correct location.
-
Copy entire SAR directory from $GRIDGAIN_HOME/config/jboss folder to deploy directory of the JBoss.
Here is how $GRIDGAIN_HOME/config/jboss/jboss-service.xml looks (note we use 1.5.0 version as an example):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | <!DOCTYPE server PUBLIC "-//JBoss//DTD MBean Service 4.0//EN" "http://www.jboss.org/j2ee/dtd/jboss-service_4_0.dtd">
<!--
JBoss service descriptor for GridGain JBoss Loader.
Classpath should contain the following libraries:
- $GRIDGAIN_HOME/libs/*.jar
- $GRIDGAIN_HOME/gridgain_1.5.0.jar
For example, if GridGain is installed into /opt/gridgain-1.5.0 then
you can use the following classpath settings to includes all
necessary JARs:
<classpath codebase="/opt/gridgain-1.5.0/gridgain-1.5.0.jar"/>
<classpath codebase="/opt/gridgain-1.5.0/libs" archives="*"/>
-->
<server>
<classpath codebase=".." /> <!-- FIX IT BEFORE USING. -->
<mbean code="org.gridgain.grid.loaders.jboss.GridJbossLoader" name="gridgain:service=loader">
<!--
config/default-spring.xml - Default GridGain configuration.
config/jboss/ha/jboss-gridgain-ha-spring.xml - JBoss specific configuration that
will use JBoss SPIs for communication and discovery. Requires JBoss HA enabled.
-->
<attribute name="ConfigurationFile">config/default-spring.xml</attribute>
</mbean>
</server>
|
|
|
Currently provided JBoss loader doesn’t work with JBoss 7. Use servlet context listener loader instead. |
8.2.5. WebLogic Loader
Weblogic loader located in org.gridgain.grid.loader.weblogic package.
Weblogic loader is used to start GridGain within Weblogic application server. GridGain loader for WebLogic implemented as a pair of start and shutdown classes. Please consult WebLogic documentation on how to configure startup classes in Weblogic. Weblogic loader is used for tight integration with Weblogic AS. Specifically, Weblogic loader integrates GridGain with Weblogic logging, MBean server, and work manager (JSR-237).
The following steps should be taken to configure startup and shutdown classes:
-
Add Startup and Shutdown Class in admin console (Environment → Startup & Shutdown Classes → New).
-
Add the following parameters for startup class:
-
Name: GridWeblogicStartup
-
Classname: org.gridgain.grid.loaders.weblogic.GridWeblogicStartup
-
Arguments: cfgFilePath=config/default-spring.xml
-
-
Add the following parameters for shutdown class:
-
Name: GridWeblogicShutdown
-
Classname: org.gridgain.grid.loaders.weblogic.GridWeblogicShutdown
-
-
Change classpath for WebLogic server in startup script: CLASSPATH="$CLASSPATH:$GRIDGAIN_HOME/gridgain.jar:$GRIDGAIN_HOME/libs/"
For more information on Weblogic start/shutdown classes see http://edocs.bea.com/wls/docs100/ConsoleHelp/taskhelp/startup_shutdown/UseStartupAndShutdownClasses.html.
Note that Weblogic is not shipped with GridGain. If you don’t have Weblogic, you need to download it separately. See http://www.bea.com more information.
8.2.6. WebSphere Loader
Websphere loader located in org.gridgain.grid.loaders.websphere package.
Websphere loader is used to start GridGain within Websphere application server. This is GridGain loader implemented as Websphere custom service (MBean). Websphere loader should is used to provide tight integration between GridGain and Websphere AS. Specifically, Websphere loader integrates GridGain with Websphere logging, MBean server and work manager (JSR-237).
The following steps should be taken to configure this loader:
-
Add CustomService in admin console (Application Servers → server1 → Custom Services → New).
-
Add custom property for this service: cfgFilePath=config/default-spring.xml.
-
Add the following parameters:
-
Classname: org.gridgain.grid.loaders.websphere.GridWebsphereLoader
-
Display Name: GridGain
-
Classpath (replace $GRIDGAIN_HOME with absolute path): "$GRIDGAIN_HOME/gridgain.jar:$GRIDGAIN_HOME/libs/". Note that forward slash (/) at the end is critical.
-
For more information consult http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/trun_customservice.html
Note that Websphere is not shipped with GridGain. If you don’t have Websphere, you need to download it separately. See http://www.ibm.com/software/websphere/ for more information.
8.2.7. Servlet context listener loader
Servlet context listener loader located in org.gridgain.grid.loaders.servlet package.
This loader can be used to startup GridGain grid inside any web container as servlet context listener. Loader must be defined in web.xml file.
1 2 3 4 5 6 7 8 | <context-param>
<param-name>cfgFilePath</param-name>
<param-value>config/default-spring.xml</param-value>
</context-param>
<listener>
<listener-class>org.gridgain.grid.loaders.servlet.GridServletContextListenerLoader</listener-class>
</listener>
|
Servlet-based loader may be used in any web container like Tomcat, Jetty and etc. Depending on the way this loader is deployed the GridGain instance can be accessed by either all web applications or by only one. See web container class loading architecture:
|
|
To start GridGain in a web container, you have to create WAR file with the following structure: gridgain.war
|-- WEB-INF/
|-- lib/
| |-- gridgain.jar
| `-- GridGain libraries (contents of $GRIDGAIN_HOME/libs folder)
`-- web.xml (shipped with GridGain in $GRIDGAIN_HOME/config/servlet folder)
This file should be copied to deployments directory. |
8.3. Life Cycle Beans
GridLifecycleBeandoc reacts to grid lifecycle events defined in GridLifecycleEventTypedoc. Use this bean whenever you need to plug some custom logic before or after grid startup and stopping routines.
There are four events you can react to:
GridLifecycleEventType.BEFORE_GRID_START |
Invoked before grid startup routine is initiated. Note that grid is not available during this event, therefore if you injected a grid instance via GridInstanceResourcedoc annotation, you cannot use it yet. |
GridLifecycleEventType.AFTER_GRID_START |
Invoked right after grid has started. At this point, if you injected a grid instance via GridInstanceResourcedoc annotation, you can start using it. |
GridLifecycleEventType.BEFORE_GRID_STOP |
Invoked right before grid stop routine is initiated. Grid is still available at this stage, so if you injected a grid instance via GridInstanceResourcedoc annotation, you can use it. |
GridLifecycleEventType.AFTER_GRID_STOP |
Invoked right after grid has stopped. Note that grid is not available during this event. |
8.3.1. Resource Injection
Lifecycle beans can be injected using IoC (dependency injection) with grid resources. Both, field and method based injection are supported. The following grid resources can be injected:
8.3.2. Usage
If you need to tie your application logic into GridGain lifecycle, you can configure lifecycle beans via standard grid configuration, add your application library dependencies into GRIDGAIN_HOME/libs/ext folder, and simply start GRIDGAIN_HOME/ggstart.(sh|bat) scripts.
8.3.3. Configuration
Grid lifecycle beans can be configured programmatically as follows:
1 2 3 4 5 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
cfg.setLifecycleBeans(new FooBarLifecycleBean1(), new FooBarLifecycleBean2());
GridFactory.start(cfg);
|
or from Spring XML configuration file as follows:
1 2 3 4 5 6 7 8 9 10 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
...
<property name="lifecycleBeans">
<list>
<bean class="foo.bar.FooBarLifecycleBean1"/>
<bean class="foo.bar.FooBarLifecycleBean2"/>
</list>
</property>
...
</bean>
|
8.4. Metadata & Meta Programming
TODO
8.5. Marshaling
8.6. Messaging
Messaging - an exchange of the messages between grid nodes - is one of the main functional areas that often used standalone in GridGain (without using main Compute and Data Grid capabilities). Given GridGain’s sophisticated topology management and auto-discovery it just makes sense for many applications to simply piggy-back on this functionality and use GridGain as an intelligent message bus.
|
|
Intelligent Message Bus GridGain messaging support provides unique features that makes it an advanced message bus:
|
8.7. Events
TODO
8.8. Grid-Enabled Executor Service
Grid.executor()doc method creates ExecutorService which will execute all submitted Callable and Runnable tasks on the grid. User may run Callable and Runnable tasks just like normally with java.util.ExecutorService, but these tasks must implement Serializable interface.
The execution will happen either locally or remotely, depending on configuration of Load Balancing SPI and Topology SPI. Distributed ExecutorService delegates commands execution to already started Grid instance. Every submitted task will be serialized and transfered to any node in grid.
Here is an example of an ExecutorService to show how it can be used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | public static void main(String[] args) throws GridException {
GridFactory.start();
try {
Grid grid = GridFactory.grid();
ExecutorService srvc = grid.executor();
List<Callable<String>> cmds = new ArrayList<Callable<String>>(2);
String testVal1 = "test-value-1";
String testVal2 = "test-value-2";
cmds.add(new FooCallable<String>(testVal1));
cmds.add(new FooCallable<String>(testVal2));
List<Future<String>> futures = srvc.invokeAll(cmds);
// Wait for command completion.
String res1 = futures.get(0).get();
String res2 = futures.get(1).get();
// Print out results.
System.out.println("Results [res1=" + res1 + ", res2=" + res2 + ']');
}
finally {
GridFactory.stop(true);
}
}
|
where simple FooCallable is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | private static class FooCallable<T> implements Callable<T>, Serializable {
/** */
private T data = null;
/**
* @param data Some data.
*/
FooCallable(T data) {
this.data = data;
}
/**
* {@inheritDoc}
*/
public T call() throws Exception {
System.out.println("Message: " + data);
return data;
}
}
|
8.9. Segmenting Grid Nodes
8.9.1. Why Segment Nodes?
Often in deployments you need to segment your grid nodes into several groups, having each group perform one or more subsets of jobs only. For example, let’s say you have a scenario where you have some nodes only submitting jobs to grid (masters), and other groups of nodes only executing these jobs (workers). Then you would segment your grid into 2 groups, masters and workers, and have each group do only what it is supposed to do.
Multiple Sub-Grids
Node segmentation allows you to create multiple sub-grids within your grid. Every sub-grid may have it’s own static physical characteristics and logical responsibilities. All node characteristics, physical or logical, if they are static, can be specified in Spring configuration and used in your Topology SPI or GridTask.map(..)doc logic to implement the segmentation (this is shown in example below).
Note, that based on its attributes, every node can participate in one or multiple segments.
Dynamic Sub-Grids
You may also wish to segment your grid based on dynamic characteristics, not static. For example, what if you only want to include nodes that have less than 50% CPU utilization. In GridGain you can achieve this by using dynamic GridNodeMetricsdoc provided by GridNodedoc. All you would have to do is grab current CPU utilization from node metrics and in your GridTask.map(..)doc method only pick the nodes with CPU’s loaded under 50%.
8.9.2. Node Segmentation Example
This example shows how you can segment your grid into static segments using GridGain. In GridGain such segmentation can be easily achieved with node attributes (see GridNode.getAttribute(String)doc). Let’s say that you want to segment your grid into 3 segments: master, worker1, and worker2.
Every node at startup should get a certain number of attributes assigned to it. Here is how this can be done from Spring XML configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
...
<property name="userAttributes">
<map>
<!--
In our example, segment value can be either
'master', 'worker1', or 'worker2'.
-->
<entry key="segment" value="worker1"/>
</map>
</property>
...
</bean>
|
Then you can restrict the topology passed into GridTask.map(..)doc method by properly configuring GridNodeFilterTopologySpi to only include nodes from segments worker1 and worker2 and always exclude nodes belonging to master segment. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="topologySpi">
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="filter">
<bean class="org.gridgain.grid.lang.GridJexlPredicate2">
<constructor-arg index="0">
<value>
<![CDATA[
node.attributes().get('segment') == 'worker1' ||
node.attributes().get('segment') == 'worker2'
]]>
</value>
</constructor-arg>
<constructor-arg index="1" value="node"/>
</bean>
</property>
</bean>
</property>
...
</bean>
|
Alternatively, you can also implement your GridTask.map(..)doc method to map your jobs only to worker node segments. You can check which node segment a node belongs to by checking its attributes via GridNode.getAttribute(String)doc method. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | public class FooBarGridTask extends GridTaskAdapter<String, String> {
...
public Map<GridJob, GridNode> map(List<GridNode> topology, String arg) {
Map<GridJob, GridNode> jobs = new HashMap<GridJob, GridNode>(topology.size());
for (GridNode node : topology) {
String segment = node.attribute("segment");
if (segment != null) {
if (segment.equals("worker1"))
// This type of job should only execute on 'worker1' segment.
jobs.put(new FooBarWorker1Job(arg), node);
else if (segment.equals("worker2"))
// This type of job should only execute on 'worker2' segment.
jobs.put(new FooBarWorker2Job(arg), node);
}
else
throw new GridException("Node does not belong to any segment.");
}
return jobs;
}
...
}
|
8.9.3. Grid Node Filters
You are able to filter nodes by providing your implementation of GridPredicate<? super GridRichNode>doc interface. Instances of classes that implement this interface are used to filter grid nodes. These instances are used to filter nodes in method GridProjection.nodes(GridPredicate<? super GridRichNode>…)doc. They are also used by GridNodeFilterTopologySpi to provide task topology based on user-defined node filters.
GridGain also comes with GridJexlPredicatedoc implementation which allows you to conveniently filter nodes based on Apache JEXL expression language. For information about specifics of JEXL expression language refer to Apache JEXL documentation.
Together with GridNodeFilterTopologySpi, GridJexlPredicate2doc allows for a fairly simple way to provide complex SLA-based task topology specifications. For example, expression below shows how the SPI can be configured with GridJexlPredicate2doc to include all Windows XP nodes with more than one processor or core and that are not loaded over 50%.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | GridNodeFilterTopologySpi topSpi = new GridNodeFilterTopologySpi();
GridJexlPredicate2<GridRichNode> filter = new GridJexlPredicate2<GridRichNode>(
"node.metrics().availableProcessors > 1 && " +
"node.metrics().averageCpuLoad < 0.5 && " +
"node.attributes().get('os.name') == 'Windows XP'",
"node");
// Add filter.
topSpi.setFilter(filter);
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override topology SPI.
cfg.setTopologySpi(topSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="topologySpi">
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="filter">
<bean class="org.gridgain.grid.lang.GridJexlPredicate2">
<constructor-arg index="0">
<value>
<![CDATA[
node.metrics().availableProcessors > 1 &&
node.metrics().averageCpuLoad < 0.5 &&
node.attributes().get('os.name') == 'Windows XP'
]]>
</value>
</constructor-arg>
<constructor-arg index="1" value="node"/>
</bean>
</property>
</bean>
</property>
...
</bean>
|
8.9.4. GridProjection-based Segmentation
GridGain 3.0 introduced another way to segment topology by using GridProjection. Described later in more details GridProjection represents dynamic view on global topology filtered by a predicate. GridProjection is also a monad providing monadic set of operations for any arbitrary set of nodes in the projection.
9. Deployment
Prior to being used, a Grid Task needs to be deployed:
-
If peer class loading is enabled (see property GridConfiguration.isPeerClassLoadingEnabled() in Grid Configuration):
-
Task class loaded from local class path if it is not defined as Local P2P Exclude
-
If there is no task class in local class path or task class needs to be peer loaded it is downloaded from task originating node.
-
-
If peer class loading is disabled:
-
Check that task class was deployed. If you are using GAR Deployment, then your task will be implicitly deployed every time GAR file or directory is changed. Otherwise, task can be deployed explicitly in code via Grid.deployTask(Class<? extends GridTask<?>>)doc method.
-
If task class was not deployed then we try to find it in local class path by task name. If you are not using @GridTaskNamedoc annotation to provide a custom task name, then your task name will default to the actual class name of the task and the task will be auto-deployed first time it’s executed (no explicit deployment step is required in this case).
-
If task has custom name (that does not correspond task class name), and this task was not deployed before, then exception will be thrown.
-
9.1. Peer Class Loading
Peer class loading (P2P) is turned on by default. To turn it off set GridConfiguration.isPeerClassLoadingEnabled()doc property to false in Grid Configuration.
Although internals of peer class loading are rather complex, what it means in a nutshell is that when a JVM on the remote node needs to find a certain class as part of the grid task execution, it will check the local class loader first and if such class cannot be found, it will ask the node that originated grid task execution (one that should have this class by design) to provide it. In an essence, GridGain class loading becomes grid-aware.
This technique is invaluable during grid application development. It allows for absolutely grid-transparent development cycle: you write your application as you normally do (in a single node environment of Eclipse, IDEA, etc.), compile and run it - and it seamlessly runs on the grid without any extra deployment or build steps what-so-ever. More over, GridGain supports hot-redeployment so you don’t have to restart GridGain every time you change the grid task code - again, just modify the code, compile and run and all your changes will be picked up on the grid.
Peer class loading sequence works as follows:
-
GridGain will check if class was loaded at system startup, and if it was, it will be returned. No class loading from a peer node will take place in this case.
-
If class is not locally loaded, then a request will be sent to task originating node to provide class definition. Originating node will send class byte code definition and the class will be loaded on a peer node.
Peer class loading should be used in most situations, especially during development with Java IDEs. It allows to dramatically reduce overhead of grid-enabled application development effectively making it as quick and productive as local application development. You simply change code and run - and your modified application seamlessly runs on the grid.
|
|
When utilizing peer class loading, you should be aware of the libraries that get loaded from peer nodes vs. libraries that are already available locally in the class path. Our suggestion is to include all 3rd party libraries into class path of every node. This way you will not transfer megabytes of 3rd party classes to remote nodes every time you change a line of code. |
|
|
Error Messages Some frameworks like Spring or CGLib ask for the certain classes to identify whether another frameworks are
available or not. For example Spring being started looks for the Groovy framework and thus when peer-to-peer
feature is on this class/resource request might be sent to remote node. If there is no such class/resource
available then you may get message "Requested resource not found" on remote (task originating node) and
"Failed to get resource due to remote failure" on local node. They are printed out for your information only. |
9.1.1. Local P2P Exclude
Note that giving preference to local deployment (as GridGain does by default) does not always work. For example, GridGain utilizes Spring for its own implementation, so Spring is always loaded locally by system class loader at startup. This may create a problem if user also utilizes Spring to load some beans reflectively. For example, Spring Hibernate support will attempt to load Hibernate classes with its own class loader (system class loader in this case) and if Hibernate is not in local class path, class definitions will not be found.
There are 2 ways to solve this problem:
-
Include Hibernate jars into class path on every node. This will perform better, as Hibernate jars will not have to be loaded with every task deployment.
-
If above does not work, you can make sure that Spring and Hibernate classes will always be loaded from a peer node by specifying their packages in GridConfiguration.getP2PLocalClassPathExclude()doc configuration property in Grid Configuration.
9.2. Deployment Modes
Deployment mode is specified at grid startup via GridConfiguration.getDeploymentMode()doc configuration property (it can also be specified in Spring XML configuration file). The main difference between all deployment modes is how classes and user resources are loaded on remote nodes via peer-class-loading mechanism. User resources can be instances of caches, databased connections, or any other class specified by user with @GridUserResourcedoc annotation.
The following deployment modes are supported:
| Mode | Description |
|---|---|
GridDeploymentMode.PRIVATEdoc |
In this mode deployed classes do not share user resources (see @GridUserResourcedoc). Basically, user resources are created once per deployed task class and then get reused for all executions. Note that classes deployed within the same class loader on master node, will still share the same class loader remotely on worker nodes. However, tasks deployed from different master nodes will not share the same class loader on worker nodes, which is useful in development when different developers can be working on different versions of the same classes. Also note that resources are associated with task deployment, not task execution. If the same deployed task gets executed multiple times, then it will keep reusing the same user resources every time. |
GridDeploymentMode.ISOLATEDdoc |
Unlike PRIVATE mode, where different deployed tasks will never use the same instance of user resources, in ISOLATED mode, tasks or classes deployed within the same class loader will share the same instances of user resources (see @GridUserResourcedoc). This means that if multiple tasks classes are loaded by the same class loader on master node, then they will share instances of user resources on worker nodes. In other words, user resources get initialized once per class loader and then get reused for all consequent executions. Note that classes deployed within the same class loader on master node, will still share the same class loader remotely on worker nodes. However, tasks deployed from different master nodes will not share the same class loader on worker nodes, which is especially useful when different developers can be working on different versions of the same classes. |
GridDeploymentMode.SHAREDdoc |
Same as GridDeploymentMode.ISOLATED, but now tasks from different master nodes with the same user version and same class loader will share the same class loader on remote nodes. Classes will be undeployed whenever all master nodes leave grid or user version changes. The advantage of this approach is that it allows tasks coming from different master nodes share the same instances of user resources (see @GridUserResourcedoc) on worker nodes. This allows for all tasks executing on remote nodes to reuse, for example, the same instances of connection pools or caches. When using this mode, you can startup multiple stand-alone GridGain worker nodes, define user resources on master nodes and have them initialize once on worker nodes regardless of which master node they came from. This method is specifically useful in production as, in comparison to GridDeploymentMode.ISOLATED deployment mode, which has a scope of single class loader on a single master node, GridDeploymentMode.SHARED mode broadens the deployment scope to all master nodes. Note that classes deployed in GridDeploymentMode.SHARED mode will be undeployed if all master nodes left grid or if user version changed. User version can be specified in META-INF/gridgain.xml file as a Spring bean property with name userVersion. This file has to be in the class path of the class used for task execution. SHARED deployment mode is default mode used by the grid. |
GridDeploymentMode.CONTINUOUSdoc |
Same as SHARED deployment mode, but user resources (see @GridUserResourcedoc) will not be undeployed even after all master nodes left grid. Tasks from different master nodes with the same user version and same class loader will share the same class loader on remote worker nodes. Classes will be undeployed whenever user version changes. The advantage of this approach is that it allows tasks coming for different master nodes share the same instances of user resources (see @GridUserResourcedoc) on worker nodes. This allows for all tasks executing on remote nodes to reuse, for example, the same instances of connection pools or caches. When using this mode, you can startup multiple stand-alone GridGain worker nodes, define user resources on master nodes and have them initialize once on worker nodes regardless of which master node they came from. This method is specifically useful in production as, in comparison to ISOLATED deployment mode, which has a scope of single class loader on a single master node, CONTINUOUS mode broadens the deployment scope to all master nodes. Note that classes deployed in CONTINUOUS mode will be undeployed only if user version changes. User version can be specified in META-INF/gridgain.xml file as a Spring bean property with name userVersion. This file has to be in the class path of the class used for task execution. |
9.2.1. User Version
User version comes into play whenever you would like to redeploy tasks deployed in SHARED or CONTINUOUS modes. By default, GridGain will automatically detect if class-loader changed or a node is restarted. However, if you would like to change and redeploy code on a subset of nodes, or in case of CONTINUOUS mode to kill the ever living deployment, you should change the user version.
User version is specified in META-INF/gridgain.xml file as follows:
1 2 3 4 | <!-- User version. -->
<bean id="userVersion" class="java.lang.String">
<constructor-arg value="0"/>
</bean>
|
By default, all gridgain startup scripts (ggstart.sh or ggstart.bat) pick up user version from GRIDGAIN_HOME/config/userversion folder. Usually, it is just enough to update user version under that folder, however, in case of GAR or JAR deployment, you should remember to provide META-INF/gridgain.xml file with desired user version in it.
9.2.2. Always-Local Development
GridGain deployment (regardless of mode) allows you to develop everything as you would locally. You never need to specifically write any kind of code for remote nodes. For example, if you need to use a distributed cache from your GridJobdoc, then you can the following:
-
Simply startup stand-alone GridGain nodes by executing GRIDGAIN_HOME/ggstart.{sh|bat} scripts.
-
Inject your cache instance into your jobs via @GridUserResourcedoc annotation. The cache can be initialized and destroyed with @GridUserResourceOnDeployeddoc and @GridUserResourceOnUndeployeddoc annotations.
-
Now, all jobs executing locally or remotely can have a single instance of cache on every node, and all jobs can access instances stored by any other job without any need for explicit deployment.
9.3. JEE Deployment
When deploying grid tasks into JEE container, you can keep using standard JEE deployment artifacts. For example, if you are deploying a WAR file into JEE container, simply add your grid task classes to the WAR file and that’s it.
9.4. GAR Deployment
GAR deployment is a traditional deployment model, similar to JAR/WAR/EAR deployment in JEE, where you create the *G*rid *AR*chive file that contains all necessary classes for the grid task and deploy it. GridGain comes with URL-based GridDeploymentSpidoc implementation so that you can deploy your GAR files on any URLs accessible via FTP, HTTP(S), POP3 or FILE protocols.
For example, when properly configured, you can just drop your GARs into certain folder on your web server and they will be deployed on the grid.
9.4.1. GAR File
GAR file is a deployable unit used by GridUriDeploymentSpidoc. GAR file is based on ZLIB compression format like simple JAR file and its structure is similar to WAR archive. GAR file has .gar extension.
GAR file structure (file or directory ending with .gar):
META-INF/
|
- gridgain.xml
- ...
lib/
|
-some-lib.jar
- ...
xyz.class
...
-
META-INF entry may contain gridgain.xml file which is a task descriptor XML file. The purpose of task descriptor XML file is to specify all tasks to be deployed. This file is a regular Spring XML definition file. META-INF entry may also contain any other file specified by JAR format.
-
lib entry contains all library dependencies.
-
Compiled Java classes must be placed in the root of a GAR file.
GAR file may be deployed without descriptor file. If there is no descriptor file, GridDeploymentSpidoc will scan all classes in archive and instantiate those that implement GridTaskdoc interface. In that case, all grid task classes must have a public no-argument constructor (you can always use GridTaskAdapterdoc adapter for convenience when creating grid tasks).
|
|
gridgain.xml GAR Descriptor gridgain.xml is optional. If not provided -
GridDeploymentSpidoc will scan all classes in GAR archive. |
By default, all downloaded GAR files that have digital signature in META-INF folder will be verified and deployed only if signature is valid.
gridgain.xml
gridgain.xml GAR descriptor file is a standard Spring XML that should contain zero or more java.util.List beans. Each list should contain fully qualified class names for grid tasks. Here’s an example of gridgain.xml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | <?xml version="1.0" encoding="UTF-8"?>
<!--
Spring configuration file for test classes in gar-file.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.0.xsd">
<description>Gridgain Spring configuration file in gar-file.</description>
<!--
Test tasks specification.
-->
<util:list id="tasks">
<value>foo.bar.SomeGridTask1</value>
<value>foo.bar.SomeGridTask2</value>
</util:list>
</beans>
|
9.4.2. Ant GAR Task
GridGain is shipped with GAR Ant task: GridGarAntTaskdoc. This task extends zip Ant task and can be used exactly like standard jar Ant task. GAR Ant task allows to archive class files and necessary dependencies (like resource files and libraries) with optional descriptor file (gridgain.xml). GridGain comes with an example of using GAr deployment including build.xml. Here’s an example of how to use GAR Ant task in typical build.xml:
1 2 3 4 5 6 7 8 9 10 | <!--
Special task for creating GAR files.
-->
<taskdef name="gar" classname="org.gridgain.grid.tools.ant.gar.GridGarAntTask"
classpathref="gg.libs.path"/>
<!-- Create GAR file. -->
<gar destfile="${examples.gar.deploy.dir}/${gar.name}"
descrdir="${examples.gar.dir}/META-INF"
basedir="${examples.gar.deploy.dir}/tmpgar"/>
|
10. Network Segmentation
10.1. Overview
Network segmentation (a.k.a. "split-brain" problem) does happen in production and must be accounted for. The cluster may become segmented because of temporary network problems when nodes (or groups of nodes) become isolated from the rest of topology. If not addressed, such segmentation can cause inconsistent clusters and, what’s worse, inconsistent data. GridGain addresses this issue by making sure that no inconsistent write is allowed into the system if segmentation occurred, while writes that are certain to be consistent are still allowed to proceed.
Each node checks network segment individually, using configured segmentation resolvers. Segmentation resolvers are pluggable and can be tailored to any environment. GridGain comes with several implementations out of the box. Segment check is performed in following cases:
-
Before discovery SPI start.
-
When any node leaves topology.
-
When any node in topology fails.
-
Periodically (see GridConfigurationAdapter.setSegmentCheckFrequency(long)doc).
Each segmentation resolver checks segment for validity. Typically, resolver should run a light-weight single check (i.e. one IP address or one shared folder). Compound segment checks may be performed using several resolvers. If segmentation resolver determines that local grid node belongs to incorrect segment, the node will act in accordance to configured segmentation policy. For details on available policies refer to documentation: GridSegmentationPolicydoc.
10.2. Configuration
Here is the list of GridConfigurationdoc properties intended to configure segmentation logic. All configuration parameters below are optional.
|
|
Segment check is disabled by default. |
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setSegmentationPolicy(GridSegmentationPolicy)doc |
Segmentation policy. |
Yes |
STOPdoc |
setSegmentationResolvers(GridSegmentationResolver…)doc |
Yes |
null |
|
setSegmentCheckFrequency(long)doc |
Network segment check frequency. |
Yes |
10000ms |
setWaitForSegmentOnStart(boolean)doc |
Wait for segment on start flag. |
Yes |
true |
setAllSegmentationResolversPassRequired(boolean)doc |
All segmentation resolvers pass required flag. |
Yes |
true |
10.3. Events
GridGain has the following built-in event types to notify on segmentation events:
10.4. Segmentation Policies
GridGain has the following built-in segmentation policies.
| Policy | Description |
|---|---|
RESTART_JVMdoc |
|
STOPdoc |
|
RECONNECTdoc |
When segmentation policy is RECONNECTdoc, all listeners will receive EVT_NODE_SEGMENTEDdoc event and then discovery manager will try to reconnect discovery SPI to topology issuing EVT_NODE_RECONNECTEDdoc event on reconnect. Note, that this policy is not allowed when distributed data grid is enabled. |
NOOPdoc |
10.5. Segmentation Resolvers
GridGain has the following built-in segmentation resolvers. You can specify multiple resolvers, in which case all the specified resolvers will be checked. Use setAllSegmentationResolversPassRequired(boolean)doc to make sure that all resolvers must pass segmentation check - otherwise segment is declared valid if it passes one of all segmentation resolver checks.
| Resolver | Description |
|---|---|
GridTcpSegmentationResolverdoc |
Segmentation resolver implementation that checks whether node is in the correct segment or not by establishing TCP connection to configured host and port and immediately closing it. This is a multi-purpose resolver as it can be used to check connectivity to any web server (e.g. try to connect to port 80), database (e.g. establish TCP connection to JDBC port), etc… |
GridSharedFsSegmentationResolverdoc |
Segmentation resolver implementation that checks whether node is in the correct segment or not by writing to and reading from shared directory. |
GridReachabilitySegmentationResolverdoc |
Segmentation resolver implementation that uses java.net.InetAddress.isReachable(NetworkInterface, int, int) to check whether node is in the correct segment or not. |
11. Compute Grid
11.1. MapReduce
11.1.1. Overview
MapReduce is a distributed computing paradigm which allows to map your task into smaller jobs based on some key, execute these jobs on Grid nodes, and reduce multiple job results into one task result.
Here is a diagram that explains how MapReduce works based on Shape Counter example. Given a collection of Shapes we split this collection into 2 parts and send every part to a grid node. Each node will count number of Shapes provided and will return it back to caller. The caller then will add results received from remote nodes and provide the reduced result back to the user (the counts are displayed next to every shape).
In GridGain, MapReduce paradigm is implemented via GridTaskdoc interface.
Map Operation
Result Operation
Upon completion of any job, GridTask.result(..) method is invoked which is responsible to tell GridGain whether to Wait for more job results, Reduce now, or Failover this job to another node.
Reduce Operation
This operation is responsible for taking multiple results from remote jobs and reducing them into one aggregate result. This aggregated result will be returned to the user.
11.1.2. Pull vs. Push MapReduce
One of the fundamental differences between GridGain’s implementation of MapReduce and the ones in the existing or legacy systems like Sun GridEngine, GigaSpaces, Hadoop and Globus is the cardinality or the type of the mapping operation. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The latter has fundamental advantage that was largely missing in grid computing frameworks before GridGain.
|
|
GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved. |
This unique property of GridGain’s MapReduce implementation has profound effect on ability to develop grid applications with the advanced load balancing, failover and collision resolution logic.
See Early And Late Load Balancing for more information.
11.2. GridTask and GridJob
11.2.1. GridTask And GridJob Interfaces
To create a grid task you need to implement GridTaskdoc interface. When implementing this interface you will also need to be aware of GridJobdoc interface. Basically, both of these interfaces define practically everything you need to know to create a grid task. In a nutshell, GridTask is responsible for splitting business logic into multiple grid jobs, receiving results from individual grid jobs executing on remote nodes, and reducing (aggregating) received jobs' results into final grid task result.
Grid task gets split into jobs when GridTask.map(List, Object)doc method is called. This method returns all jobs for the task mapped to their corresponding grid nodes for execution. Grid will then serialize this jobs and send them to requested nodes for execution.
11.2.2. Executing Grid Tasks
Grid-enabling is a process of making a piece of Java code to execute on the grid. In GridGain, there are two ways to do grid-enabling: API-based and annotation-based.
|
|
Direct Execution vs. Annotation-Based AOP There is no better or worse between these two methods. They both have their areas of applicability.
When creating grid task you basically have the same programming and development model as in JEE: you
create a component, deploy it and execute it. With annotation-based grid-enabling you have an extra option
of transparently attaching grid-enabling logic to existing code without modifying it (except for
additional annotation). |
API-Based Grid Task Execution
This method allows to grid-enable any arbitrary Java code. You have a full control on split and aggregate logic and all other aspects of grid task execution. Here is an example of direct grid task execution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | public static void main(String[] args) throws GridException {
GridFactory.start();
try {
Grid grid = GridFactory.getGrid();
// Execute task.
GridTaskFuture<String> future = grid.execute(FooBarTask.class, "Argument");
// Wait for task completion.
String result = fugure.get();
// Print out task result.
System.out.println("Task result: " + result);
}
finally {
GridFactory.stop(true);
}
}
|
Annotate Existing Method With Gridify Annotation
The only difference of this method vs. directly executing grid task is that you can annotate a regular Java method and it will become grid-enabled. Using this technique you can still have custom grid task that will handle annotation-based grid-enabling (including split & aggregate logic or passing state to remote jobs) but you will be limited to the boundaries of the method you are grid-enabling. Here is an example of such usage:
1 2 3 4 | @Gridify(taskClass = FooBarTask.class, timeout = 3000)
public void sayIt(String arg) {
// Some business logic.
}
|
For information on how to configure AOP, refer to AOP Configuration section.
|
|
Serializable State Note that when using @Gridifydoc annotation on non-static methods
without specifying explicit grid task, the state of the whole instance will be serialized and sent
out to remote node. Therefore the class must implement java.io.Serializable interface. If you cannot
make the class Serializable, then you must implement custom grid task which will take care of proper
state initialization. In either case, GridGain must be able to serialize the state passed to remote node. |
11.2.3. Configuring Grid Tasks
Starting with GridGain 2.1 you can start multiple instances of Topology SPI, Load Balancing SPI, Failover SPI and Checkpoint SPI. If you do that, you need to tell a task which SPI to use (by default it will use the fist SPI in the list).
Add @GridTaskSpisdoc annotation to your task to specify which SPIs it wants to use. If this annotation is omitted, then by default GridGain will pick the first corresponding SPI implementation from the array provided in configuration.
For more information and examples refer to Specifying Different SPIs Per GridTask documentation.
11.2.4. Grid Task Execution Sequence
The sequence of task execution can be described as following:
-
Upon request to execute a grid task with given task name system will find deployed task with given name.
-
System will create new Distributed Grid Task Session. Also see GridTaskSessiondoc.
-
System will inject all annotated resources (including Distributed Grid Task Session) into grid task instance. See Resources Injection for more information.
-
System will call method map(…) on GridTaskdoc interface. These method is basically responsible for splitting business logic of grid task into multiple grid jobs (units of execution) and mapping them to grid nodes. Method map(…) returns a map of grid jobs keyed by the grid nodes. Consider using @GridLoadBalancerResourcedoc to inject load balancer into task for assigning jobs to the best available nodes.
-
System will start sending grid jobs to their respective nodes.
-
Upon arrival to remote node, grid job gets put on waiting list which is passed to underlying GridCollisionSpidoc SPI.
-
The Collision SPI on remote node will decide one of the following scheduling policies:
Policy Description WAIT
Grid Job will be kept on waiting list. In this case, job will not get a chance to execute until next time the Collision SPI is called. Collision SPI gets called every time a new job arrives or an active one completes.
EXECUTE
Grid Job will be moved to active list (i.e. activated). In this case system will proceed with job execution.
REJECT
Job on the waiting list can be rejected before they get a chance to start executing. In this case the GridJobResultdoc passed into GridTask.result(GridJobResult, List)doc method will contain GridExecutionRejectedExceptiondoc exception. If you are using any of the task adapters shipped with GridGain, then job will be failed over automatically for execution on another node.
CANCEL
If GridJob is on the active list and is currently executing, then it can be canceled by calling GridJob.cancel()doc method. Note that in this case job will still complete and return a result from GridJob.execute()doc method.
-
For activated jobs on remote nodes, system will inject all annotated resources (including Distributed Grid Task Session) into grid job instance. See Resources Injection for more information.
-
Remote nodes will execute the jobs by calling GridJob.execute()doc method.
-
If job gets canceled while executing on remote node, then GridJob.cancel()doc method will be called. Note that just like with Thread.interrupt() method, grid job cancellation serves as a hint that a job should stop executing or exhibit some other user defined behavior. Generally it is up to a job to decide whether it wants to react to cancellation or ignore it. Job cancellation can happen for several reasons:
-
Collision SPI has canceled an active job.
-
Parent task has completed without waiting for this job’s result.
-
User canceled task by calling GridTaskFuture.cancel()doc method.
-
-
Once job execution is complete, the return value will be sent back to parent task and will be passed into GridTask.result(GridJobResult, List)doc method. If job execution resulted in a checked exception, then GridJobResult.getException()doc method will contain that exception. If job execution threw a runtime exception or error, then it will be wrapped into GridUserUndeclaredExceptiondoc exception. # Method GridTask.result(GridJobResult, List)doc is called for each job result and decides whether or not to continue waiting for the remaining results, failover current result or reduce immediately based on returned policy.
Policy Description GridJobResultPolicy.WAITdoc
If this policy is returned, then Grid Task will continue to wait for other job results. If this result is the last job result, then GridTask.reduce(List)doc method will be called.
GridJobResultPolicy.REDUCEdoc
If this policy is returned, then method GridTask.reduce(List)doc will be called right away without waiting for other jobs' completion (all remaining jobs will receive a cancel request).
GridJobResultPolicy.FAILOVERdoc
If this policy is returned, then job will be failed over to another node for execution. The node to which job will get failed over to is decided by GridFailoverSpidoc SPI implementation. Note that if you use any of task adapters then they will automatically fail-over jobs to ther nodes for 2 known failure cases: node crash and job rejection.
-
When enough results are received, method GridTask.reduce(List)doc is called to aggregate (reduce) these results into one final grid task result.
-
After reduce(…) is complete - the result is returned to user as grid task result and can be retrieved from GridTaskFuture.get()doc method.
-
System will clean up all task session resources (such as checkpoints with session scope). Execution of the grid task is considered finished at this point.
11.2.5. Grid Task Coding Guidelines
There are certain known patterns and anti-patterns to be aware of when developing grid task and jobs.
Serialization and Deserialization
Jobs created by task are moved from one grid node to another. Before sending they are serialized into the byte stream and thus need to implement java.io.Serializable interface. On remote node every job is deserialized with a class loader that depends on deployment method (see Grid Deployment).
Prior to GridGain 2.1, every grid job class member (including super classes) except for static members need to implement java.io.Serializable. Static class members will not be sent to remote node and should be initialized on remote node. Note also that task parameters passed into GridJob.execute()doc method are sent to remote nodes and need to implement java.io.Serializable as well.
Starting with GridGain 2.1, you can configure different Grid Marshallers and depending on a marshaller, serialization may either be required or not.
Inner and Anonymous Classes
Any kind of inner classes or anonymous classes are allowed. Write your code as you usually do and GridGain will distribute it. You can implement your job as anonymous class within grid task class and use task class members inside your job. Here is an example of anonymous job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import java.io.*;
import java.util.*;
import org.gridgain.grid.*;
/**
* Test task with anonymous job which uses method scope variable.
*/
public class TestGridTask extends GridTaskSplitAdapter<String> {
/** Dummy multiplier. */
private int multiplier = 3;
/**
* This method is responsible for splitting a task into multiple jobs.
*/
@Override
protected Collection<? extends GridJob> split(int gridSize, final String arg) throws GridException {
List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(gridSize);
for (int i = 0; i < gridSize; i++) {
jobs.add(new GridJobAdapter<String>() {
/**
* Every job simply multiplies number of characters in the argument by some multiplier.
*/
public Serializable execute() throws GridException {
return multiplier * arg.length();
}
});
}
return jobs;
}
/**
* Reduces multiple job results into one task result.
*/
public Object reduce(List<GridJobResult> results) throws GridException {
int sum = 0;
// For the sake of this example, let's sum all results.
for (GridJobResult res : results) {
sum += (Integer)res.getData();
}
return sum;
}
}
|
Here we have anonymous job class created at line 20 which uses method-scope variable arg of task class declared in method signature at line 16 and used in job at line 25 as well as task class member multiplier declared at line 10 and used at line 25.
Overriding Methods with Gridify Annotation
If you have following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | public class A {
@Gridify
protected methodA() {
...
}
}
public class B extends A {
@Override
protected methodA() {
...
super.methodA();
...
}
}
|
and use aspects you should get B.methodA() called twice, first on your local node and second time on remote node regardless of class or method modifiers. This is a feature of aspects implementation and we don’t recommend to use @Gridify in parent classes.
Here is step by step explanation:
-
You create object of class B.
-
You make a call to B.methodA() and since this method does not have annotation in class B aspects will not work.
-
Your B.methodA() executes and it calls super.methodA()
-
A.methodA() has annotation and thus aspect will call GridGain and distribute your object of class B and method call to a grid node.
-
On the grid node (local or remote) B.methodA() will be called (note that you have object of class B) again.
-
Your B.methodA() executes and it calls super.methodA()
-
Method A.methodA() has annotation but GridGain will catch this situation and it won’t be distributed twice but instead will be just called.
As you can see we have 2 executions of B.methodA() and only one A.methodA().
11.2.6. Resources Injection
GridTaskdoc and GridJobdoc implementations can be injected using IoC (dependency injection) with grid resources. Both, field and method based injection are supported.
The following grid resources can be injected:
| Resource | Description | ||
|---|---|---|---|
@GridTaskSessionResourcedoc |
Injects Distributed Grid Task Session. |
||
@GridInstanceResourcedoc |
Injects the actual instance of Griddoc this task is executed on. |
||
@GridLoggerResourcedoc |
Injects an instance of GridLoggerdoc logger used by this grid instance. |
||
@GridHomeResourcedoc |
Injects a path to GridGain installation home. |
||
@GridExecutorServiceResourcedoc |
Injects an instance of java.util.concurrent.ExecutorService used by this grid. |
||
@GridLocalNodeIdResourcedoc |
Injects local grid node ID of type java.util.UUID. |
||
@GridMBeanServerResourcedoc |
Injects an instance of javax.management.MBeanServer used by this grid node. |
||
@GridJobIdResourcedoc |
This resource can only be injected into Grid Jobs and not Grid Tasks. It injects unique job execution ID of type java.util.UUID into an instance of Grid Job. |
||
@GridSpringApplicationContextResourcedoc |
This resource injects the Spring application context into tasks and jobs. You can use it for accessing Spring beans or any other information available in Spring application context. By default, this application context is the same as the one used for configuring GridGain, but you can pass a custom one by calling GridFactory.start(GridConfiguration, ApplicationContext)doc method.
|
||
@GridUserResourcedoc |
Use this annotation to inject custom resources into tasks and jobs. The scope of this resource is per-task, so it will be initialized once the task is deployed and de-initialized once task is undeployed. Also see @GridUserResourceOnDeployeddoc and @GridUserResourceOnUndeployeddoc for controlling resource life cycle. |
||
@GridMarshallerResourcedoc |
Resource can be injected into the task, job or SPI and gives you simple way of marshalling/unmarshalling data or objects (since 2.1.0). |
||
@GridSpringBeanResourcedoc |
Injects any custom resources declared in provided Spring ApplicationContext. It can be injected into grid tasks and grid jobs. The resource will be picked up from provided Spring ApplicationContext by name value. Note, that injected spring bean must be declared in Spring ApplicationContext on every grid node where they get accessed (since 2.1.0). |
Refer to Resources Injection for more information.
11.2.7. Convenience Adapters
| Adapter | Description |
|---|---|
GridTaskAdapterdoc |
Grid Task adapter that provides default implementation for GridTask.result(GridJobResult, List)doc method which implements automatic fail-over to another node if remote job has failed due to a node crash (detected by GridTopologyExceptiondoc exception) or due to job execution rejection (detected by GridExecutionRejectedExceptiondoc exception). |
GridTaskSplitAdapterdoc |
Grid Task adapter that hides the job-to-node mapping logic from user and provides convenient GridTaskSplitAdapter.split(int, Object)doc method for splitting task into sub-jobs in homogeneous environments. |
GridJobAdapterdoc |
Grid Job adapter that provides default empty implementation for GridJob.cancel()doc method and also allows user to set and get job argument, if there is one. |
Refer to corresponding adapter documentation for more information.
11.2.8. Distributed Session Attributes And Checkpoints
Both, Grid Tasks and Grid Jobs can utilize Distributed Grid Task Session for coordination with each other via session attributes and checkpoints.
Session Attributes
Jobs can communicate with parent task and with other job siblings from the same task by setting session attributes (see GridTaskSessiondoc). Other jobs can wait for an attribute to be set either synchronously or asynchronously. Such functionality allows jobs to synchronize their execution with other jobs at any point and can be useful when other jobs within task need to be made aware of certain event or state change that occurred during job execution.
Saving Checkpoints
Long running jobs may wish to save intermediate checkpoints to protect themselves from failures. There are three checkpoint management methods available on Grid Task Session which allow user to save, load, and remove checkpoints.
Jobs that utilize checkpoint functionality should attempt to load a check point at the beginning of execution. If a non-null value is returned, then job can continue from where it failed last time, otherwise it would start from scratch. Throughout it’s execution job should periodically save its intermediate state to avoid starting from scratch in case of a failure.
Refer to Distributed Grid Task Session documentation for more information.
11.2.9. MapReduce Paradigm
The design of GridTaskdoc is heavily influenced by Google MapReduce paradigm. For more information about MapReduce paradigm, refer to MapReduce: Simplified Data Processing on Large Clusters article from Google.
11.2.10. Example
Below is a grid task implementation that is responsible for split and aggregate (a.k.a map/reduce) logic. Note that this implementation uses GridTaskSplitAdapterdoc that simplifies API for grid tasks in homogeneous grids (which is often the case). Main two methods that are implemented here are split and reduce. Method reduce aggregates result (number of characters in the string) returned from every node.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | package org.gridgain.examples.helloworld.api;
import org.gridgain.grid.*;
import java.util.*;
import java.io.*;
public class GridHelloWorldTask extends GridTaskSplitAdapter<String, Integer> {
/** Auto-injected grid logger. */
@GridLoggerResource
private GridLogger log = null;
@Override
public Collection<? extends GridJob> split(int gridSize, String phrase) throws GridException {
// Split the passed in phrase into multiple words separated by spaces.
String[] words = phrase.split(" ");
List<GridJob> jobs = new ArrayList<GridJob>(words.length);
for (String word : words) {
// Every job gets its own word as an argument.
jobs.add(new GridJobAdapter<String>(word) {
/*
* Simply prints the word passed into the job and
* returns number of letters in that word.
*/
public Serializable execute() {
String word = getArgument();
if (log.isInfoEnabled() == true) {
log.info(">>>");
log.info(">>> Printing '" + word + "' on this node from grid job.");
log.info(">>>");
}
// Return number of letters in the word.
return word.length();
}
});
}
return jobs;
}
/**
* Sums up all letters from all jobs and returns a
* total number of letters in the phrase.
*
* @param results Job results.
* @return Number of letters for the phrase passed into
* <tt>split(gridSize, phrase)</tt> method above.
* @throws GridException If reduce failed.
*/
public Integer reduce(List<GridJobResult> results) throws GridException {
int totalCharCnt = 0;
for (GridJobResult res : results) {
// Every job returned a number of letters
// for the word it was responsible for.
Integer charCnt = res.getData();
totalCharCnt += charCnt;
}
// Account for spaces. For simplicity we assume one space between words.
totalCharCnt += results.size() - 1;
// Total number of characters in the phrase
// passed into task execution.
return totalCharCnt;
}
}
|
11.3. GridProjection
TODO
11.4. GridTaskSession
11.4.1. Overview
Distributed task session is created for every task execution. It is defined by GridTaskSessiondoc interface. Task session is distributed across the parent task and all grid jobs spawned by it, so attributes set on a task or on a job can be viewed on other jobs. Correspondingly attributes set on any of the jobs can also be viewed on a task.
Session has 2 main features: attribute and checkpoint (see Checkpoint SPI for more details) management. Both, attributes and checkpoints, can be used from task itself and from the jobs belonging to this task. Session attributes and checkpoints can be set from any task or job methods. Session attribute and checkpoint consistency is fault tolerant and is preserved whenever a job gets failed over to another node for execution. Whenever task execution ends, all checkpoints saved within session with GridTaskSessionScope.SESSION_SCOPEdoc scope will be removed from checkpoint storage. Checkpoints saved with GridTaskSessionScope.GLOBAL_SCOPEdoc will outlive the session and can be viewed by other tasks.
The sequence in which session attributes are set is consistent across the task and all job siblings within it. There will never be a case when one job sees attribute A before attribute B, and another job sees attribute B before A. Attribute order is identical across all session participants. Attribute order is also fault tolerant and is preserved whenever a job gets failed over to another node.
11.4.2. Connected Tasks
Note that apart from setting and getting session attributes, tasks or jobs can choose to wait for a certain attribute to be set using any of the GridTaskSession.waitForAttribute(..) methods. Tasks and jobs can also receive asynchronous notifications about a certain attribute being set through GridTaskSessionAttributeListenerdoc listener. Such feature allows grid jobs and tasks remain connected in order to synchronize their execution with each other and opens a solution for a whole new range of problems.
Imagine for example that you need to compress a very large file (let’s say terabytes in size). To do that in grid environment you would split such file into multiple sections and assign every section to a remote job for execution. Every job would have to scan its section to look for repetition patterns. Once this scan is done by all jobs in parallel, jobs would need to synchronize their results with their siblings so compression would happen consistently across the whole file. This can be achieved by setting repetition patterns discovered by every job into the session. Once all patterns are synchronized, all jobs can proceed with compressing their designated file sections in parallel, taking into account repetition patterns found by all the jobs in the split. Grid task would then reduce (aggregate) all compressed sections into one compressed file. Without session attribute synchronization step this problem would be much harder to solve.
11.4.3. Session Injection
Session can be injected into a task or a job using IoC (dependency injection). See [Resources Injection] page for additional details.
11.4.4. Example
Below is a grid task implementation that is responsible for split and aggregate (a.k.a map/reduce) logic. Note that this implementation uses GridifyTaskSplitAdapterdoc that simplifies API for grid tasks in homogeneous grids (which is often the case). Main two methods that are implemented here are split and reduce. Method reduce aggregates results (sums up all numbers returned by jobs) to calculate length of initial string.
This task will split passed in string into separate word and then pass each word into its own job for execution on different nodes. Every job will do the following:
-
Execute grid-enabled method with argument passed in.
-
Add its argument to the session.
-
Wait for other jobs to add their arguments to the session.
-
Execute grid-enabled method with all session attributes concatenated into one string as an argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | package org.gridgain.examples.helloworld.gridify.session;
import java.io.*;
import java.util.*;
import org.gridgain.grid.*;
import org.gridgain.grid.gridify.*;
import org.gridgain.grid.resources.*;
/**
* Grid task for {@link GridifyHelloWorldSessionExample} example. It handles spiting
* this example into multiple jobs for execution on remote nodes.
* <p>
* Every job will do the following:
* <ol>
* <li>Execute grid-enabled method with argument passed in.</li>
* <li>Add its argument to the session.</li>
* <li>Wait for other jobs to add their arguments to the session.</li>
* <li>Execute grid-enabled method with all session attributes concatenated into one string as an argument.</li>
* </ol>
*/
public class GridifyHelloWorldSessionTask extends GridifyTaskSplitAdapter<Integer> {
/** Grid task session will be injected. */
@GridTaskSessionResource
private GridTaskSession ses = null;
/**
* {@inheritDoc}
*/
@Override
protected Collection<? extends GridJob> split(int gridSize, GridifyArgument arg) throws GridException {
String[] words = ((String)arg.getMethodParameters()[0]).split(" ");
List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(words.length);
for (String word : words) {
jobs.add(new GridJobAdapter<String>(word) {
/** Job context will be injected. */
@GridJobContextResource
private GridJobContext jobCtx = null;
/**
* Executes grid-enabled method once with all
* session attributes concatenated into string
* as an argument and again with passed in argument.
*/
public Serializable execute() throws GridException {
String word = getArgument();
// Set session attribute with value of this job's word.
ses.setAttribute(jobCtx.getJobId(), word);
try {
// Wait for all other jobs within this task to set their attributes on
// the session.
for (GridJobSibling sibling : ses.getJobSiblings()) {
// Waits for attribute with sibling's job ID as a key.
if (ses.waitForAttribute(sibling.getJobId()) == null) {
throw new GridException("Failed to get session attribute from job: " +
sibling.getJobId());
}
}
}
catch (InterruptedException e) {
throw new GridException("Got interrupted while waiting for session attributes.", e);
}
// Create a string containing all attributes set by all jobs
// within this task (in this case an argument from every job).
StringBuilder msg = new StringBuilder();
// Formatting.
msg.append("All session attributes [ ");
for (Serializable jobArg : ses.getAttributes().values()) {
msg.append(jobArg).append(' ');
}
// Formatting.
msg.append(']');
// For the purpose of example, we simply log session attributes.
log.info(msg.toString());
// Execute gridified method again and return the number
// characters in the passed in word.
return GridifyHelloWorldSessionExample.sayIt(word);
}
});
}
return jobs;
}
/**
* Sums up all characters from all jobs and returns a
* total number of characters in the initial phrase.
*
* @param results Job results.
* @return Number of letters for the word passed into
* {@link GridifyHelloWorldSessionExample#sayIt(String)} method.
* @throws GridException If reduce failed.
*/
public Integer reduce(List<GridJobResult> results) throws GridException {
int totalCharCnt = 0;
for (GridJobResult res : results) {
// Every job returned a number of letters
// for the phrase it was responsible for.
Integer charCnt = res.getData();
totalCharCnt += charCnt;
}
// Account for spaces. For simplicity we assume one space between words.
totalCharCnt += results.size() - 1;
// Total number of characters in the phrase
// passed into task execution.
return totalCharCnt;
}
}
|
11.5. Zero Deployment
Zero Deployment is a unique GridGain feature which automatically monitors deployed resources on the grid and where all necessary JVM classes and resources are loaded on demand. This enables users to simply launch default GridGain nodes, which then immediately become part of the data and compute grid topology without any need for explicit deployment of user’s classes or resources. Their resources are automatically utilized.
Zero Deployment technology seamlessly delivers code updates throughout the grid/cloud topology - eliminating any need for re-building, re-deployment, re-restating, awkward IDE plugins or tool chains. All you do is keep writing and changing your code, and whenever you need to execute it, just hit the Run button in your IDE. Your new code will be automatically deployed on all grid nodes. This feature works with both, compute and data grid. GridGain further provides three different modes of peer-to-peer deployment, supporting the most complex deployment environments like custom class loaders or WAR/EAR files.
|
|
Distributed class loading and class sharing are supported and allow fine-grained control over how classes and user resources are loaded and shared on remote nodes. You can find details in the section about Deployment Modes. |
|
|
Provisioning on cloud infrastructure, such as Amazon AWS, is dramatically simplified by GridGain’s CloudBoot technology. It minimizes dependencies on cloud provider images by dynamically loading necessary parts of an application during image startup. This eliminates the need to rebuild cloud images every time an application changes. |
11.5.1. Example
1 2 3 4 5 6 7 | // Create a new object Runnable() and execute it on all grid nodes, local and remote.
G.grid().run(BROADCAST, new Runnable() {
@Override public void run() {
// Send the text string to all nodes and print it out on each.
System.out.println("Hello World from all nodes");
}
});
|
The console output looks like this:
[15:17:11] Node JOINED [nodeId8=72d78a0b, addr=[10.1.10.23], order=1340662631022, CPUs=4] [15:17:11] Topology snapshot [nodes=4, CPUs=4, hash=0xD2ED1710] Hello World from all nodes [15:17:16] Node LEFT [nodeId8=72d78a0b, addr=[10.1.10.23], order=1340662631022, CPUs=4] [15:17:16] Topology snapshot [nodes=3, CPUs=4, hash=0x4CF83855]
Note that the new Runnable() anonymous class has been created and deployed onto the grid automatically. The remote nodes had been started with their default configuration without prior knowledge about the new class - this class has been loaded and deployed on demand automatically at execution time.
11.6. Resource Injection
GridGain allows dependency injection of both internal GridGain resources as well as user resources. It supports field-based and method-based injection. Any resources with the proper annotations will be injected into the corresponding task, job or SPI before it is initialized.
The following internal resources can be injected:
User resources can be injected with these annotations:
11.6.1. Examples
The complete source code for the examples is located on GitHub
Logger Resource Example
Grid logger is provided to the grid via GridConfigurationdoc. It is used to provide a handle on the configured logger from tasks, jobs, or SPIs. Use @GridLoggerResourcedoc annotation to inject this resource. Here is how a variable injection would typically happen:
1 2 3 4 5 6 | public class MyGridJob implements GridJob {
...
@GridLoggerResource
private GridLogger log = null;
...
}
|
or how a method injection looks like:
1 2 3 4 5 6 7 8 9 10 | public class MyGridJob implements GridJob {
...
private GridLogger log = null;
...
@GridLoggerResource
public void setLogger(GridLogger log) {
this.log = log;
}
...
}
|
User Resource
@GridUserResourcedoc injects a custom user resource into grid tasks or grid jobs. Use it when you would like to use something like a JDBC connection pool from your tasks or jobs - this way your connection pool will be instantiated only once per task and then reused for all executions of this task.
The resource will be created based on the resourceClass value. If resourceClass is not specified, then the field type or setter parameter type will be used to infer the class type of the resource. Set resourceClass to a specific value if the class of resource cannot be inferred from field or setter declaration (for example, if field is an interface).
The user resource will be instantiated once on every node where a given task is deployed. For resource deployment as well as undeployment callbacks use GridUserResourceOnDeployeddoc and GridUserResourceOnUndeployeddoc annotations. This will typically be used for the initialization of the injectable resource such as opening a database connection, network connection or reading configuration settings.
|
|
User resources are never serialized (they get instantiated) and should always be declared as transient. Also, for this reason resources should not be sent to remote nodes. |
|
|
The scope of user resource depends on Deployment Mode used. You can configure your user resources to be deployed on per-task, per-class-loader, or per-grid basis. Take a look at Deployment Mode documentation for more information. |
|
|
GridNodeLocal can be used to create a singleton local state per grid node to be reused between various job executions as well. |
Use the @GridUserResourcedoc annotation to inject this resource. Here is how a variable injection would typically happen:
1 2 3 4 5 6 | public class MyGridJob implements GridJob {
...
@GridUserResource
private transient MyUserResource rsrc = null;
...
}
|
where the corresponding resource class can look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | public class MyUserResource {
...
// Establish a connection to be shared.
@GridUserResourceOnDeployed private void deploy() {
connection conn = connPool.getConnection();
}
...
// Close the connection when the task is finished.
@GridUserResourceOnUndeployed private void undeploy() {
connPool.closeConnection(conn);
}
...
}
|
11.7. GridNodeLocal
When working in distributed environment often you need to have a consistent local state per grid node that is reused between various job executions. For example, what if multiple jobs require database connection pool for their execution - how do they get this connection pool to be initialized once and then reused by all jobs running on the same grid node? Essentially you can think about it as a per-grid-node singleton service, but the idea is not limited to services only, it can be just a regular Java bean that holds some state to be shared by all jobs running on the same grid node.
Before GridGain 3.0 this approach was handled by using @GridUserResourcedoc annotation to annotate fields within GridTaskdoc or GridJobdoc classes to specify singleton beans. However, this approach was dependent on GridDeploymentModedoc configuration and, for ISOLATED or PRIVATE deployment modes, resource could be initialized multiple times, once per GridTask. This forced users to use various hacks in their logic and generally was not very convenient to use.
Starting with GridGain 3.0 GridNodeLocaldoc per-grid-node local cache was introduced. The name was borrowed from ThreadLocal class in Java, because just like ThreadLocal provides unique space per-thread in Java, GridNodeLocal provides unique space per-grid-node in GridGain. GridNodeLocal implements java.util.concurrent.ConcurrentMap interface and is absolutely lock-free. In fact, it simply extends java.util.concurrent.ConcurrentHashMap implementation and, therefore, inherits all the methods available there.
Here is an example of how GridNodeLocal could be used to create some user specific singleton connection pool from a simple GridGain job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | final Grid grid = G.start(..);
...
// Execute runnable job on some remote grid node.
grid.run(GridClosureCallMode.BALANCE, new Runnable() {
public void run() {
GridNodeLocal<String, MySingletonConnectionPool> nodeLocal = grid.nodeLocal();
// 1. First see if someone already stored connection pool in node-local storage.
MySingletonConnectionPool pool = nodeLocal.get("connPool");
if (pool == null) {
// 2. Create new connection pool and store it in node-local storage.
MySingletonConnectionPool old = pool.putIfAbsent("connPool", pool = new MySingletonConnectionPool(..));
if (old != null)
pool = old;
}
// Perform operations with connection pool.
...
}
});
|
11.8. Failover
In case of a node crash jobs are automatically failed over to another node. However, in GridGain you can also treat any result that comes back from a remote job execution as a failure. The remote node can still be alive, but it may be running low on CPU, I/O, disk space, etc… – there are many conditions that may result in a failure within your application and that you can use to trigger a failover. GridGain allows you to optionally failover a job based on any job result. Moreover, you have the ability to chose to which node a job should be failed over to as it could be different for different applications or different computations within the same application.
The Failover SPI is responsible for handling the failed execution of a grid job. In all cases Failover SPI takes the failed job and a list of all grid nodes to select another node on which the job execution will be retried. Failover SPI ensures that the job is not re-mapped to the same node it had failed on. Failover is triggered when the method GridTask.result(GridJobResult, List)doc returns the GridJobResultPolicy.FAILOVER policydoc. Then the SPI takes the failed job and list of all grid nodes to produce another node on which the job execution will be retried. GridGain comes with different built-in customizable Failover SPI implementations.
11.8.1. Example
This example illustrates how the Failover SPI can be invoked by returning the Failover policy. See the Failover SPI for examples on how to configure it.
1 2 3 4 5 6 7 | ...
// In case of any exception, the GridJobResultPolicy.FAILOVER policy is returned.
@Override public GridJobResultPolicy result(GridJobResult result, List<GridJobResult> received)
throws GridException {
return result.getException() != null ? GridJobResultPolicy.FAILOVER : GridJobResultPolicy.WAIT;
}
...
|
11.9. Topology Management
TODO
11.10. Collision Resolution
Custom logic may be used to determine how grid jobs should be scheduled and executed when they arrive on a destination grid node. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation, which can be configured with the Collision SPI.
Every time a new job arrives, it gets placed on waiting queue and it is up to collision SPI to either reject or activate a waiting job, or cancel an active job, or do nothing. Generally, the Collision SPI gets invoked in the following cases:
-
A new job has arrived.
-
An existing job has finished.
-
A node metrics update has been received.
-
The SPI is actively invoked
For more on how to configure collision resolution, refer to the Collision SPI. In the following section there is more on how collision resolution can help with job stealing for late load balancing.
11.11. Load Balancing
11.11.1. Overview
In MapReduce pattern the mapping is a process of splitting the initial task into sub-tasks and assigning them to the grid nodes. Mapping generally involves the splitting logic itself, mapping sub-tasks to the nodes including load balancing, and potential failover and collision resolution. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The later has fundamental advantage that was largely missing in grid computing frameworks before GridGain.
|
|
GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved. |
11.11.2. Early And Late Load Balancing
The sequence of steps described below shows when Early and Late load balancing policies come into play:
-
Someone calls one of GridProjection.execute(..) methods passing grid task and its argument to initiate grid task execution in the system.
-
Method GridTask.map(..) will be called on the task to perform the initial mapping. This method is responsible for taking a task, splitting it into number of sub-tasks and mapping every sub-task with one or more grid nodes. This method returns set of sub-task:node pairs. This is what we call Early Load Balancing as it is done right during initial mapping operation and with only information available at the execution initiation time (see Load Balancing SPI documentation).
-
Once mapping is done the sub-tasks will travel to respective remote nodes for execution.
-
When sub-task arrives to the destination grid node it will be subject for collision (scheduling) resolution via Collision SPI. This SPI is called every time when new sub-task arrived, existing sub-task finished its execution or a metrics update is received (with every heartbeat). Collision SPI looks into the queue of its sub-tasks (including a newly received one, if any) and can either cancel sub-task, leave it waiting in the queue, transfer it to another node for execution, or start its execution locally. This is what we call Late Load Balancing. This load balancing happens later in the process of execution and it happens on destination node right where sub-task is about to get executed.
The important characteristic of the late load balancing is that there can be a significant time difference between mapping (early load balancing) and actual time when execution of the sub-task commences on the remote node - and late load balancing allows to account for this non-deterministic aspect of grid execution and potentially re-balance the sub-task on the grid.
For example, our Job Stealing Collision SPI does exactly that. It monitors number of queued sub-tasks on each node and preemptively moves waiting sub-tasks from "busy" node to the "idle" node for execution.
Load balancing capabilities in GridGain are more of the advanced features and not everyone would need them. For example, in homogeneous grid with homogeneous tasks load balancing achieved naturally. However, in many other cases when conditions are more real-life - sophisticated load balancing capabilities are about the only way to get the most out of your grid.
For more information on MapReduce refer to Map/Reduce: Simplified Data Processing on Large Clusters article from Google.
Early Load Balancing
Load balancing is a simple process of the optimal assignment of jobs to the nodes where these jobs to be executed. As almost all kernel level functionality in GridGain the load balancing is designed as SPI (Service Provider Interface). It consists of the public SPI and several implementations. Number of pre-built implementations are shipped with GridGain and user can develop one easily.
Load balancing SPI provides the next best balanced node for job execution. This SPI is used either implicitly or explicitly whenever a job gets mapped to a node during GridTask.map(..) invocation
This load balancing is usually referred as early load balancing as it happens early in the process of the grid task execution during mapping phase of MapReduce process. Note that late load balancing happens during collision resolution and is handled by Collision SPI.
Late Load Balancing
Grid jobs are said to be in collision when a job arrives onto node that already has one or more jobs either waiting or executing on it. Job collision resolution provides means to resolve this collision by basically allowing to:
-
put newly arrived job into the waiting queue
-
schedule it for immediate execution
-
cancel it (and preempt it by failing it over to another node)
-
wake up already waiting job from the queue and schedule it for immediate execution
As almost any kernel level functionality in GridGain collision is designed as SPI (Service Provider Interface). It consists of the public API and several implementations. As always, several pre-built implementations are shipped with GridGain and available for the developer - and custom ones can be easily built.
Collision SPI allows to regulate how grid jobs get executed when they arrive on a destination node for execution. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation: all jobs can proceed in parallel, or jobs can be serialized i.e., only one job can execute in any given point of time, or only certain number or types of grid jobs can proceed in parallel, etc.
Collision SPI doesn’t expose any public APIs and works implicitly behind the scenes. As with any SPI, developer can provide its own implementation and plug it into GridGain.
Collision is generally referred as late load balancing as it happens late in the execution process when job has already arrived onto destination node. In fact, it allows to load balance jobs in the context of the given node. Note that early load balancing handled by Load Balancing SPI and occurs during initial mapping phase of MapReduce process.
11.12. AOP-Based Grid-Enabling
TODO
11.13. Closure Execution
TODO
11.14. Executor Service
TODO
11.15. Cron-Based Scheduling
TODO
11.16. Remote Actors vs. GridGain and Concurrency Unification
This is a bit of off-topic chapter discussing the differences between popular Actors concept (remote actors specifically) and functionality provided by GridGain. I was convinced to write about it after I got questions on Actors vs. GridGain Scalar at almost every conference I spoke about our Scalar DSL.
When talking about Actors there is always a bigger topic of Concurrency Unification trend that aims to combine principles of local multithreading concurrency and distributed programming. We at GridGain are strong supporters of concurrency unification. The example later on in this chapter will show some of our current work in this direction.
|
|
actor vs. Actor
We use lowercase actor to denote an instance of actor class or type, and uppercase Actor to denote the concept of actors. |
Back to Actors… After my presentation at GeeCON in the spring of 2011 I got the email asking for GridGain version of Pi-calculation example from Akka, a very popular and deservingly so, Actor framework in Scala. The sender was asking for help to compare Akka/Scala actors approach to basic distributed programming and GridGain Scalar’s approach.
I haven’t seen the Akka’s example before so I first downloaded the Akka 1.1 and looked at it…
Now, before we get to it I want to re-iterate few points related to Actor-based concurrency (note that these points are implementation-agnostic and apply equally to Scala actors or Akka actors).
I believe that Actors is an important "new" abstraction for elegantly resolving multithreading concurrency. I’m not, however, subscribing to an idealistic view that they are drop-in replacement for threads and java.util.concurrent utilities. Most of the real-life examples and applications that I’ve seen use all of these mechanisms together with actors.
|
|
Actors
I believe that Actors is an important "new" abstraction for elegantly resolving multithreading concurrency. |
It is often repeated that Actors do work best when they are used throughout the application - not just for solving one particular synchronization problem - but to build the entire subsystem or a module based on Actors. I tend to agree. Mixing and matching shared-state concurrency with actors produce rather awkward combination and significantly negates any advantages Actors bring.
There are, of course, many use cases where Actors simply don’t work well. Anytime you need a fine grain control, general performance fine-tuning or determinism on threading, or when you need more sophisticated locking algorithms (read/write, counting critical sections, etc.), or shared state is unavoidable - Actors tend to produce more verbose and less flexible solution. For example, I’ve seen several times attempts to implement pseudo-semaphore synchronization on the group of actors - and this was rather ugly.
Now, despite my positive outlook about general Actors for better concurrency management I have more reservations about Remote Actors - i.e. applying the same Actors concept in the distributed context.
11.16.1. Remote Actors and Concurrency Unification
Remote Actors basically allow to exchange messages between actors in different JVMs.
One of the major appeal of remote actors is that they attempt to bridge local JVM multithreading and distributed concurrency. At the first glance it seems rather elegant to extend the share-nothing message passing metaphor into distributed context to provide long sought-after Concurrency Unification.
The obvious contention is that in the distributed systems:
-
State is by default not shared by the same JVM
-
Not shared state exposed to parallel access from multiple JVMs
-
Data is already passed as serialized messages
|
|
Actors In Distributed Context
The key features of actors in local JVM multithreading are already present in distributed systems. |
Yet, distributed systems introduce the host of their own challenges comparatively to local JVM multithreading (JVM-M):
-
much larger latencies
-
cost of message passing is not negligible anymore and can easily exceed the processing time
-
resource starvation and deadlocking due to conditions not present in JVM-M
-
topology management & discovery
-
heterogenous environment (different CPUs, number of cores, memory sizes, OSes, networking, language runtime, etc.)
-
failover is very different conceptually from JVM-M
-
distributed load balancing not present in JVM-M
-
data sharing (a.k.a In-Memory Data Grid) is fundamentally different from sharing data in JVM-M
-
compute sharing (a.k.a. Compute Grid) is fundamentally different from sharing computations in JVM-M
-
deployment and provisioning of the code is dramatically different from JVM-M
It should be pretty obvious that challenges of distributed systems is far wider and more complex in nature than local JVM multithreading and therefore it makes more sense to adopt distributed practices to local JVM multithreading (and not vice verse) to gain true Concurrency Unification. In fact, you need to design from more general to more specific APIs when attempting to unify two related concepts.
11.16.2. Back To Example
So, here is the example of Pi calculation verbatim from the Akka 1.1 tutorial. I took the liberty to remove some excessive comments to make the code shorter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | package akka.tutorial.first.scala
import akka.actor.{Actor, PoisonPill}
import Actor._
import akka.routing.{Routing, CyclicIterator}
import Routing._
import java.util.concurrent.CountDownLatch
object Pi extends App {
calculate(nrOfWorkers = 4, nrOfElements = 10000, nrOfMessages = 10000)
sealed trait PiMessage
case object Calculate extends PiMessage
case class Work(start: Int, nrOfElements: Int) extends PiMessage
case class Result(value: Double) extends PiMessage
class Worker extends Actor {
// define the work
def calculatePiFor(start: Int, nrOfElements: Int): Double = {
var acc = 0.0
for (i <- start until (start + nrOfElements))
acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)
acc
}
def receive = {
case Work(start, nrOfElements) =>
self reply Result(calculatePiFor(start, nrOfElements)) // perform the work
}
}
class Master(nrOfWorkers: Int, nrOfMessages: Int, nrOfElements: Int, latch: CountDownLatch)
extends Actor {
var pi: Double = _
var nrOfResults: Int = _
var start: Long = _
// create the workers
val workers = Vector.fill(nrOfWorkers)(actorOf[Worker].start())
// wrap them with a load-balancing router
val router = Routing.loadBalancerActor(CyclicIterator(workers)).start()
// message handler
def receive = {
case Calculate =>
// schedule work
for (i <- 0 until nrOfMessages) router ! Work(i * nrOfElements, nrOfElements)
// send a PoisonPill to all workers telling them to shut down themselves
router ! Broadcast(PoisonPill)
// send a PoisonPill to the router, telling him to shut himself down
router ! PoisonPill
case Result(value) =>
// handle result from the worker
pi += value
nrOfResults += 1
if (nrOfResults == nrOfMessages) self.stop()
}
override def preStart() {
start = System.currentTimeMillis
}
override def postStop() {
// tell the world that the calculation is complete
println("\n\tPi estimate: \t\t%s\n\tCalculation time: \t%s millis"
.format(pi, (System.currentTimeMillis - start)))
latch.countDown()
}
}
def calculate(nrOfWorkers: Int, nrOfElements: Int, nrOfMessages: Int) {
// this latch is only plumbing to know when the calculation is completed
val latch = new CountDownLatch(1)
// create the master
val master = actorOf(new Master(nrOfWorkers, nrOfMessages, nrOfElements, latch)).start()
// start the calculation
master ! Calculate
// wait for master to shut down
latch.await()
}
}
|
Now, there are several observations about this example:
-
It work obviously only on a single local JVM.
-
To make it use Remote Actors will require more code, more configuration, more build and more deployment steps.
-
It’s probably not the best example of Actors applicability as you can quite easily re-write it with, for example, parallel collection from Scala 2.9 - but nonetheless that’s one code snippet that’s featured in Akka tutorial as a prime example.
When I looked at this example for the first time - I got really surprised by its complexity because in a nutshell this is simply a multithreaded calculation of the trivial math formula.
It took me about 15 minutes to write this equivalent in GridGain (Scala, Groovy and Java versions below):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import org.gridgain.scalar._
import scalar._
import org.gridgain.grid.GridClosureCallMode._
import org.gridgain.grid.Grid
object ScalarPiCalculationExample {
/** Number of calculations per node. */
private val N = 10000
// Entry point.
def main(args: Array[String]) = scalar { g: Grid =>
println("Pi estimate: " +
g.@<[Double, Double](SPREAD, for (i <- 0 until g.size()) yield () => calcPi(i * N), _.sum)
}
// Basic Pi formula.
def calcPi(start: Int): Double =
(start until (start + N)) map (i => 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)) sum
}
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import org.gridgain.grid.*
import static org.gridgain.grid.GridClosureCallMode.*
import static org.gridgain.grover.Grover.*
import org.gridgain.grover.categories.*
@Typed
@Use(GroverProjectionCategory)
class GroverPiCalculationExample {
private static int N = 10000
static void main(String[] args) {
grover { Grid g ->
println("Pi estimate: " +
g.reduce$(SPREAD, (0..<g.size()).collect { { -> calcPi(it * N) } }, { it.sum() } )
)
}
}
private static double calcPi(int start) {
(start..<(start + N)).inject(0) { double sum, int i ->
sum + (4.0 * (1 - (i % 2) * 2) / (2 * i + 1))
}
}
}
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import static org.gridgain.grid.GridClosureCallMode.*;
/**
* This example calculates Pi number in parallel on the grid.
*/
public final class GridPiCalculationExample {
/** Number of calculation per node. */
private static final int N = 1000;
// Basic Pi formula.
private static double calcPi(int start) {
double acc = 0.0;
for (int i = start; i < start + N; i++)
acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1);
return acc;
}
// Entry point.
public static void main(String[] args) throws GridException {
G.start();
try {
Grid g = G.grid();
System.out.println("Pi estimate: " +
g.reduce(SPREAD, F.yield(F.range(0, g.size()), new C1<Integer, Double>() {
@Override public Double apply(Integer i) {
return calcPi(i * N);
}
}), F.sumDoubleReducer())
);
}
finally {
G.stop(true);
}
}
}
|
As I see it the GridGain’s version has a distinctive set of advantages:
-
It’s shorter… almost 4 times shorter for Scala version, and it’s much easier to understand
-
It’s distributed by default - it will simply work just as equally on one node (like Akka’s version) as on thousands of nodes without any code or configuration change or any deployment steps
-
It doesn’t require the use of any low-level synchronization utilities like latch in Akka’s version
-
Its implementation is a lot more Scala-friendly - you just use closures as if you do everything locally - and they get automatically deployed and distributed on-demand
|
|
GridGain’s Version is Fully Distributed
What is startling is that fully distributed GridGain’s version that includes:
is almost 4 times shorter and much easier to understand - than a local-only, non-distributed Akka version. |
Rich Hickey, the inventor of Clojure programming language, provide another critical view on actors model (albeit in specifically local context). In his own words:
-
It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock. Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.
-
It doesn’t let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying. Reads and writes get serialized and block each other, etc.
-
It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can’t observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.
-
It is often the case that taking something that works well locally and transparently distributing it doesn’t work out - the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn’t transparent and the code has to change anyway.
And just in case you really, really need a local-only version of the same code you can easily "pin" the execution to local node simply by replacing global monadic projection with local node. Just replace the line 15 in Scala example:
1 | g.@<[Double, Double](SPREAD, for (i <- 0 until g.size()) yield () => calcPi(i * N), _.sum))
|
with this line:
1 | g.localNode.@<[Double, Double](SPREAD, for (i <- 0 until g.size()) yield () => calcPi(i * N), _.sum))
|
and this will provide the exact functionality of Akka-based example (in case you really need it).
12. In-Memory Data Grid
Data Grid, or In-Memory Data Grid, is a fancy word for distributed data caching. In a nutshell it provides applications with ability to keep data in memory for high availability rather than constantly fetching it from slower storage elsewhere, like RDBMS or shared file systems.
Another way to look at In-Memory Data Grids is to see their complimentary value to Compute Grids. As you recall, Compute Grids are responsible for parallelization of processing (or computations). Once processing is distributed, it is only natural to aim for distribution and partitioning of the data that will be processed by Compute Grid as well - otherwise the non-distributed or centralized data storage (like RDBMS) will quickly become a performance bottle neck in your system.
|
|
Compute and In-Memory Data Grid
There is a great deal of synergy between Compute Grids and In-Memory Data Grids. In fact, almost any real-life high performance distributed system will have both in some degree. |
On top of high availability, data grids generally allow to scale large amounts of data as well, which is also called data partitioning. When data is partitioned, every key/value pair stored on data grid will be assigned to a designated primary node, and optionally a configurable amount of designated back up nodes (which can optionally be active or inactive). Data should never be lost as long as at least one back up node for it still remains. Such approach allows to use memory available on all nodes within data grid as one whole shared memory, with each node responsible for caching a portion of data allocated to it.
Picture below illustrates these basic points about In-Memory Data Grids:
GridGain’s In-Memory Data Grid provides very comprehensive functionality that includes these key features:
-
Local, Replicated, and Partitioned cache modes
-
Collocation of computations and data
-
Extremely rich post functional APIs
-
Zero Provisioning and Deployment for cached data
-
Support for batch reads and writes
-
Synchronous and asynchronous modes (including commits)
-
Replication and invalidation modes
-
Concurrent CAS-like atomic operations
-
Advanced data querying, including support for SQL, TEXT, and FULL SCAN queries
-
Local and remote reducers and transformers for distributed data queries
-
Pluggable persistent storage with support for read-through and write-through semantics
-
Pluggable data affinity for data partitioning and collocation of computations with data
-
Synchronous and Asynchronous data preloading
-
Pluggable Data Overflow or Swap to disk for effective memory management
-
Optimistic and Pessimistic transactions with Read-Committed, Repeatable-Read, and Serializable isolation levels
-
Extremely scalable, feather-weight Eventually-Consistent transactions
-
Pluggable Eviction Policies, including out of the box support for LIRS, LRU, LFU, FIFO, and RANDOM eviction modes
-
Flexible Cache Projections for fine grained control over cache behavior and custom cache views
-
JEE/JTA integration
-
REST-based operations support
-
Write-behind cache (asynchronous cache store updates)
-
Eventually consistent behavior support
-
Full support for Document-style data structures such as JSON
In the following chapter we’ll discuss some of the key concepts in the GridGain’s In-Memory Data Grid.
12.1. Key Concepts
In this chapter we will overview some of the key features of In-Memory Data Grid. However, this documentation is not meant to replace main Javadoc API Documentation, and you should still refer to Javadoc for detailed information on APIs.
|
|
Cache vs. Data Grid vs. In-Memory Data Grid
We’ll be using terms cache, data grid and in-memory data grid in both upper case and lower case interchangeable through this chapter and later on. |
12.1.1. Collocation of Computations and Data
One of the major scalability problems in utilizing data grids is unnecessary noise traffic which may consume significant amount of bandwidth and often can bring a server to its knees. Imagine a scenario when you are using a partitioned cache and have to constantly retrieve various key-value pairs from cache and perform some computation on them. However, in partitioned mode, every key-value pair may or may not be cached on the local node, so it needs to be fetched from remote nodes. Once the data is fetched and brought to a local node, you perform the computation on it and, once you are done, the data you just requested is most likely discarded.
It may be cached in Near cache on the local mode (which is default GridGain behavior), but Near caches are generally much smaller than partitioned caches (size limitation) and have more aggressive eviction policies than partitioned caches. So to summarize, most of the data access from caches is either immediately discarded or will be discarded shortly after - thus creating unnecessary noise traffic.
It is much more effective to bring the computation exactly to the node where data resides as opposed to bring the data to computation. It is so because in absolute majority of cases computations are much smaller in size to transport over the network and they are changing much less frequently, if at all.
|
|
Collocation
Collocation between computations and data is often called Affinity Routing highlighting that computations and data have affinity between them and computation jobs will be routed based on this affinity. |
In GridGain such collocation is easily achieved via compute and data grid integration. Here is how this can be achieved using @GridCacheAffinityMappeddoc annotation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | Grid g = G.grid();
final GridCache<Integer, String> cache = g.cache(); // Get default cache.
final Integer key = 1;
String result = g.call(BALANCE, new Callable<String>() {
// Affinity key for the job. The job will travel
// to the node where the key is cached.
@GridCacheAffinityMapped
public Integer affinityKey() {
return key;
}
// The logic below will be executed on remote node which
// is responsible for caching the specified affinity key.
@Override public String call() throws Exception {
// Get locally cached value.
String val = cache.get(key);
// Perform some computation on retrieved value.
...
return "OK";
}
});
|
12.1.2. Zero Deployment
As we already discussed zero deployment is a GridGain feature which automatically monitors deployed resources on the grid and redeploys it whenever they change. With GridGain you can basically startup several grid nodes, and just leave them running. You will never have to deploy or redeploy anything on them. All you do is keep writing and changing your code, and whenever you need to execute it, just hit the Run button in your IDE and your new code will be automatically deployed on all grid nodes.
This feature works with both, compute and data grid. However, unlike compute grid, data grid keeps auto-deployed objects cached on remote nodes, and hence behaves a little differently. GridGain cache can be deployed in two different GridDeploymentModesdoc - SHARED and CONTINUOUS.
|
|
Zero Deployment in Data Grid
Unlike compute grid, data grid keeps auto-deployed objects cached on remote nodes, and hence behaves a little differently. |
In SHARED mode, objects will be auto-deployed on remote nodes, but they will be automatically undeployed (hence, removed from cache) whenever either code changes or last node from which a resource has been deployed leaves. The class loader on remote nodes is shared only for the nodes that have common classes, thus if nodes don’t share the same code base, their class loaders remotely will not be shared. This mode is ideal for development, when code changes quite frequently, and after every change it is generally best to start off fresh.
In CONTINUOUS mode objects are also automatically deployed on remote nodes, but they all share the same class-loader remotely and never get automatically undeployed / removed unless specifically specified this way by a user by changing userVersion in META-INF/gridgain.xml file. This mode is ideal for production when it is generally undesirable to undeploy (and hence remove) any object from cache.
12.1.3. Cache Modes
GridGain data grid can be deployed in any of the following 3 modes defined by GridCacheModedoc: LOCAL, REPLICATED, or PARTITIONED
|
|
Cache Modes
You can have as many named caches as you like and each can be of LOCAL, REPLICATED, or PARTITIONED type. |
LOCAL mode is the most light weight mode of cache operation, as no data is distributed to other cache nodes. It is ideal for scenarios where data is either read-only, or can be periodically refreshed at some expiration frequency. It also works very well with read-through behavior where data is loaded from persistent storage on misses. Other than distribution, local caches still have all the features of distributed cache, such as automatic data eviction, expiration, disk swapping, data querying, transactions, and more.
REPLICATED mode provides the utmost availability as data is available on every grid node. However, in this mode every data update must be propagated to all other nodes which can have an impact on performance and scalability. As the same data is stored on all grid nodes, the size of replicated cache is limited by the amount of memory available on a node. This mode is ideal for scenarios where updates are infrequent and data availability is most important.
PARTITIONED cache is the most scalable distributed cache mode. In this mode the overall data set is divided equally into partitions and all partitions are split equally between participating nodes, essentially creating one huge distributed memory for caching data. This approach allows for storing as much data as can be fit in the total memory available across all nodes, hence allowing for loading gigabytes and terabytes of data into cache memory. Partitioned cache is always fronted by a smaller local cache, also known as Near cache, which stores most recently or most frequently accessed data. Such combination provides for high availability of data that is accessed often together with high scalability of partitioned cache. This mode is ideal for scenarios where data volumes are large and updates are relatively frequent.
12.1.4. Rich Post-Functional APIs
Majority of data grid products provide a simple java.util.concurrent.ConcurrentMap API for working with data grids. However plain ConcurrentMap API is quite limiting and does not often provide the desired convenience or usability. For example, imagine that you need to store objects of different types in cache, say Person and Organization, keyed by an Integer.
Using plain ConcurrentMap<K, V> generics you would have to lose strong typing provided by generics and declare the map as ConcurrentMap<Integer, Object>… ouch! Or take a look at methods like Map.put(…), ConcurrentMap.putIfAbsent(…), or Map.remove(…). If you follow standard Map API then both of these methods have to return a previous value. However, when working with caches, returning previous value is expensive as it may involve a trip to persistent data store or to a neighboring node - why make that extra network trip for cases when previous value is not needed?
To address these issues, and many others, GridCacheProjectiondoc, which is the main caching API in GridGain, has over 200 methods, all of which can basically satisfy every potential use case you can think of. Here is some functionality available on GridCacheProjection API:
-
Various get(…) methods to synchronously or asynchronously get values from cache.
-
Various put(…), putIfAbsent(…), and replace(…) methods to synchronously or asynchronously put single or multiple entries into cache.
-
Various remove(…) methods to synchronously or asynchronously remove single or multiple keys from cache.
-
Various contains(…) method to check if cache contains certain keys or values.
-
Various forEach(…), forAny(…), and reduce(…) methods to visit every cache entry within this projection.
-
Various flagsOn(…), flagsOff(…), and projection(…) methods to set specific flags and filters on a cache projection.
-
Methods like keySet(…), values(…), and entrySet(…) to provide views on cache keys, values, and entries.
-
Various peek(…) methods to peek at values in global or transactional memory, swap storage, or persistent storage.
-
Various reload(…) methods to reload latest values from persistent storage.
-
Various unswap(…) methods to load specified keys from swap storage into global cache memory.
-
Various invalidate(…) methods to set cached values to null.
-
Various lock(…), unlock(…), and isLocked(…) methods to acquire, release, and check on distributed locks on a single or multiple keys in cache.
-
Various clear(…) methods to clear elements from cache, and optionally from swap storage.
-
Various evict(…) methods to evict elements from cache, and optionally store them in underlying swap storage for later access.
-
Various txStart(…)+ and inTx(…) methods to perform various cache operations within a transaction.
-
Various createXxxQuery(…) methods to query cache using either SQL, LUCENE, H2TEXT text search, or SCAN for filter-based full scan.
-
Various mapKeysToNodes(…) methods which provide node affinity mapping for given keys.
-
Various gridProjection(…) methods which provide GridProjectiondoc only for nodes on which given keys reside.
12.1.5. Extended put and remove Operations
All methods that end with x, such as putx(…) or removex(…), provide the same functionality as their sibling methods that don’t end with x, however, instead of returning a previous value, they return a boolean flag indicating whether operation succeeded or not. Returning a previous value may involve a network trip or a persistent store lookup and should be avoided whenever not needed.
12.1.6. Cache Projection
Cache projections, defined by GridCacheProjectiondoc API are responsible for providing the above mentioned rich API for GridGain data grid. However, you can also use projections to create various views on cache data or to enable/disable certain cache features programmatically. For example, here is how you would create cache views that work explicitly on objects of type Person or objects of type Company:
1 2 3 4 5 | // Only objects of type Person.
GridCacheProjection<Integer, Person> people = grid.cache().projection(Integer.class, Person.class);
// Only objects of type Company.
GridCacheProjection<Integer, Company> companies = grid.cache().projection(Integer.class, Company.class);
|
Or here is how you would programmatically enable synchronousCommit mode for the view on object of type Person defined above:
1 | GridCacheProjection<Integer, Person> syncCommitPeople = people.flagsOn(GridCacheFlag.SYNC_COMMMIT);
|
12.1.7. Cache Transactions
Most of the data grid products support transactions. However in many cases they will only provide automatic enlisting into an ongoing JEE/JTA cache transaction which is quite limiting, especially when not running in JEE container. In many cases it is a lot more convenient to use cache transactions directly. GridGain supports both, automatic enlistment into ongoing JEE transaction and explicit cache transactions. Explicit transactions are supported via GridCacheTxdoc.
GridGain supports the following concurrency levels:
-
OPTIMISTIC
-
PESSIMISTIC
-
EVENTUALLY_CONSISTENT
as well as the following isolation levels:
-
READ_COMMITTED
-
REPEATABLE_READ
-
SERIALIZABLE
Such a wide support for concurrency and isolation levels allows to model any kind of concurrent access pattern on any set of data.
Here are examples of how transactions can be used:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | GridCache<String, Integer> cache = G.grid().cache();
...
GridCacheTx tx = cache.txStart();
try {
// Perform transactional operations.
Integer v1 = cache.get("k1");
Integer old1 = cache.put("k2", 2);
cache.removex("k3");
// Commit the transaction.
tx.commit();
}
finally {
tx.end(); // Rollback, if was not committed.
}
|
Or, the same logic as above can be executed by passing one or more closures to any of the GridCache.inTx(..) methods as follows:
1 2 3 4 5 6 7 8 9 10 11 | GridCache<String, Integer> cache = G.grid().cache();
...
cache.inTx(new CI1<GridCacheProjection<String, Integer>>() {
@Override public void apply(GridCacheProjection<String, Integer> cache) {
// Perform transactional operations.
Integer v1 = cache.get("k1");
Integer old1 = cache.put("k2", 2);
cache.removex("k3");
}
}
|
12.1.8. Cache Queries
Why would you ever query cached data if you can query your persistent store, such as database? Well, the answer is the same as for accessing data by key from cache vs. getting it from database - for performance and scalability. However, querying cache is not exactly the same as querying your database - the main difference is that if cache only has a subset of data stored in database, then you will be only querying that subset, so query result will be reflecting only in-memory state. Does this matter? Depends on your application requirements and also depends on the amount of data you are able to store in cache.
With introduction of cloud computing and virtual instances, the amount of memory available to your grid on the cloud becomes virtually limitless. Adding nodes to your grid has become as simple as calling AWS API on EC2 whenever your application demands it. On top of it, if GridGain swap space is configured, all the data that cannot fit in memory on a single node will be overflown to disk. Also, your application may not even have that much data, or often querying cached data, which usually contains data that has been accessed relatively recently, is good enough. Thus in many cases querying cache is becoming to look more and more like querying your database.
Now that you made a decision in your project that you want to query cached data, the next question becomes how to cache query results. Most of us are familiar with Hibernate and it’s support for 2nd Level Caching which also comes with Query Cache. The way query cache works in Hibernate is generally the way we are used to think of caching queried data.
In a nutshell, a query is issued against the database and the results of the query are then stored in cache in a single collection. If you have multiple queries, then multiple collections containing query results are stored. Now if you ever update a single bean in Hibernate which can potentially affect the query result (pretty much any change to the queried tables), Hibernate is forced to invalidate (remove) the cached query results from cache and reload them on-demand next time. This significantly increases memory consumption, and frequent cache invalidations of query results perform horribly and do not scale at all. Even Hibernate itself discourages its users from using it. Here is the quote from Hibernate documentation:
So, how does querying of cached data help? It helps by entirely removing the need for query result cache altogether. SQL queries on your indexed cached data are executed in memory and perform very fast, so there is no more need to cache query results. Just run your SQL query on your cached data and get the results whenever you need them. However, it is important to note that without rich SQL support for cache queries, they will not be able to replace database queries within your project. In the example below, where Person relates to Company, if your cache does not support SQL joins, then you would not be able to find all people working for the same company, which may be quite limiting.
|
|
Querying Cache
Querying cached data removes the need for query result cache altogether. |
In GridGain the support for cache queries is virtually without any limitations. If you know SQL, you can run queries against cached data without any limitations, including support for any type of joins, any where clause keywords, order by, group by, etc… In addition to SQL queries, GridGain also supports text queries using Lucene or H2 TEXT underlying indexing. You can also run predicate-based FULL SCAN queries, which will iterate over all cache elements on remote nodes and will include only the ones that passed the predicate filter.
|
|
Cache Query Types
GridGain support four types of cache queries:
|
SQL Queries
GridCacheQueryType.SQL query type allows to execute distributed cache queries using standard SQL syntax. All values participating in where clauses or joins must be annotated with GridCacheQuerySqlFielddoc annotation for indexing. There are almost no restrictions as to which SQL syntax can be used. All inner, outer, or full joins are supported, as well as rich set of SQL grammar and functions. GridGain relies on H2 SQL Engine for SQL compilation and indexing. For full set of supported Numeric, String, and Date/Time SQL functions please refer to H2 Functions documentation directly. For full set of supported SQL syntax refer to H2 SQL Select Grammar.
Note that whenever using group by queries, only individual page results will be sorted and not the full result sets. However, if a single node is queried, then the result set will be accurate.
Text Queries
GridGain supports two type of text queries:
-
GridCacheQueryType.LUCENE
-
GridCacheQueryType.H2TEXT
All fields that are expected to show up in text query results must be annotated with GridCacheQueryLuceneFielddoc or GridCacheQueryH2TextFielddoc accordingly. The Lucene based text search utilizes Apache Lucene internally for text indexing, and the H2 TEXT search stores text indexes in special H2 index tables.
Scan Queries
Sometimes when it is known in advance that SQL query will cause a full data scan, or whenever data set is relatively small, the GridCacheQueryType.SCAN query type may be used. With this query type GridGain will iterate over all cache entries, skipping over the entries that don’t pass the optionally provided key or value filters. In this mode the query clause should not be provided.
Execute vs. Visit
If there is no need to return result to the caller node, you can save on a potentially significant network overhead by visiting all query results directly on remote nodes by calling GridCacheQuery.visit(GridPredicate, GridProjection…) method. With this method, all the logic is performed inside of query predicate directly on the queried nodes. If the predicate will return false while visiting, then visiting will finish immediately.
Optional Key and Value Filters
Note that all query results may be additionally filtered by specifying predicates for key and value filtering via GridCacheQuery.remoteKeyFilter(GridOutClosure) and GridCacheQuery.remoteValueFilter(GridOutClosure) methods. These additional filters are useful whenever filtering is based on logic or methods not available in SQL or TEXT queries. For SCAN queries this filters should be usually provided as they are used directly to filter the query results during full scan.
Query Future Iterators
Note that GridCacheQueryFuturedoc implements Iterable interface directly and therefore can be used in regular iterator or foreach loops. The iterator will immediately return all query results that are currently available and will block on page boundaries, whenever the next page is not available yet. Whenever the full result set is needed as a collection, then GridCacheQuery.keepAll(boolean) flag should be set to true and any of the future’s get(…) methods should be called.
Query Example
As an example, suppose we have data model consisting of Employee and Organization classes defined as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | public class Organization {
@GridCacheQuerySqlField(unique = true)
private long id;
@GridCacheQuerySqlField
private String name;
...
}
public class Person {
// Unique index.
@GridCacheQuerySqlField(unique=true)
private long id;
@GridCacheQuerySqlField
private long orgId; // Organization ID.
// Not indexed.
private String name;
// Non-unique index.
@GridCacheQuerySqlField
private double salary;
// Index for text search.
@GridCacheQueryLuceneField
private String resume;
...
}
|
Then you can create and execute queries that check various salary ranges like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | GridCache<Long, Person> cache = G.grid().cache();
...
// Create query which selects salaries based on range for all employees
// that work for a certain company.
GridCacheQuery<Long, Person> qry = cache.createQuery(SQL, Person.class,
"from Person, Organization where Person.orgId = Organization.id " +
"and Organization.name = ? and Person.salary > ? and Person.salary <= ?");
// Query all nodes to find all cached GridGain employees
// with salaries less than 1000.
qry.queryArguments("GridGain", 0, 1000).execute(grid);
// Query only remote nodes to find all remotely cached GridGain employees
// with salaries greater than 1000 and less than 2000.
qry.queryArguments("GridGain", 1000, 2000).execute(grid.remoteProjection());
// Query local node only to find all locally cached GridGain employees
// with salaries greater than 2000.
qry.queryArguments(2000, Integer.MAX_VALUE).execute(grid.localNode());
|
Here is a possible query that will use Lucene text search to scan all resumes to check if employees have Master degree:
1 2 3 4 | GridCacheQuery<Long, Person> mastersQry = cache.createQuery(LUCENE, Person.class, "Master");
// Query all cache nodes.
mastersQry.execute(grid.localNode()));
|
12.1.9. Eviction Policies
Selecting proper cache eviction strategy is one of the main parts of cache configuration. Generally, eviction controls the maximum number of elements that can be stored in cache, just so cache does not grow indefinitely. However, every eviction strategy will evict elements in different order and selecting a wrong strategy can have a significant impact on cache hit ratio and performance.
It is also important to note that eviction policy is pluggable in GridGain, and users can plug their own eviction policy whenever none of the ones provided out of the box is adequate. The following eviction policies are available out of the box:
Note that if GridCacheConfiguration.isSwapEnabled() set to true, then evicted entries will be overflown to a swap storage, which is, by default, a file-based disk storage defined by GridFileSwapSpaceSpi Javadoc.
12.1.10. Preloading
Preloading newly started cache nodes is important whenever it is necessary to have common data set in memory on all nodes (you may not need it if cache can always read-through a missing value from a persistent store). When preloading is enabled (i.e. has value other than GridCachePreloadMode.NONE), distributed caches will attempt to preload all necessary values from other grid nodes. GridGain supports the following preloading modes defined in GridCachePreloadModedoc:
-
GridCachePreloadMode.SYNC mode is a synchronous preload mode. Distributed caches will not start until all necessary data is loaded from other available grid nodes.
-
GridCachePreloadMode.ASYNC mode is asynchronous preload mode (this mode is configured by default). Distributed caches will start immediately and will load all necessary data from other available grid nodes in the background.
-
GridCachePreloadMode.NONE mode is there to disable preloading. In this mode no preloading will take place which means that caches will be either loaded on demand from persistent store whenever data is accessed, or will be populated explicitly.
Note that REPLICATED caches will try to load the full set of cache entries from other nodes (or as defined by pluggable GridCacheAffinitydoc, while PARTITIONED caches will only load the entries for which current node is primary or backup.
Also note that preload mode does not makes sense for LOCAL caches as they are local by definition and, therefore, cannot preload any values from neighboring nodes.
12.1.11. Cache Store
Persistent storage in GridGain is defined by GridCacheStoredoc API. Providing proper cache store implementation is important whenever read-through or write-through behavior is desired. Read-through means that data will be read from persistent store whenever it’s not available in cache, and write-through means that data will be automatically persisted whenever it is updated in cache.
Note that there is also refresh-ahead mode specified by GridCacheConfiguration.getRefreshAheadRatio() configuration parameter. If value is other than zero, then entry will be preloaded in the background whenever it is accessed and refresh ratio of it’s total time-to-live has passed. This feature ensures that entries are always automatically re-cached whenever they are nearing expiration.
|
|
Example For example, if refresh ratio is set to 0.75 and entry’s time-to-live is 1 minute, then if this entry is
accessed any time after 45 seconds since last update (which is 0.75 of a minute), the cached value will be
immediately returned, but entry will be asynchronously reloaded from persistent store in the background. |
Example implementations of cache stores (one is backed by JDBC and another one by Hibernate) can be found under GRIDGAIN_HOME/examples/java/org/gridgain/examples/cache/store folder.
12.1.12. Write-Behind Cache
In a simple write-through mode each cache put and remove operation will involve a corresponding request to the storage and therefore the overall duration of the cache update might be relatively long. Additionally, an intensive cache update rate can cause an extremely high storage load.
For such cases GridGain offers an option to perform asynchronous storage update also known as write-behind. The key concept of this approach is to add a persist request to the queue and postpone data persistence to a certain point in future. The actual data persistence can be triggered by time-based events (the maximum time that data entry can reside in the queue is limited), by queue-size events (the queue is flushed when it’s size reaches some particular point), or by using both of them in combination in which case either event will trigger the flush.
What benefits does write-behind cache store provide? In addition to obvious performance benefits, because cache writes simply become faster, this approach scales a lot better as long as your application can tolerate delayed persistence updates. When number of nodes in data grid grows and every node performs frequent updates, it is very easy to overload the underlying system of records, like database. Write-behind approach allows to maintain high throughput of writes in the system without bottlenecking at the persistence layer. Moreover, cache can continue operating even if your database crashes or goes down. In this case the persistence queue will keep storing all the updates until the database comes back up.
|
|
Example With write-behind approach only the last update to an entry will be written to the underlying storage.
If cache entry with key key1 is sequentially updated with values value1, value2 and value3
respectively, then only single store request for (key1, value3) pair will be propagated to the persistent storage. |
|
|
Example Batch store operations are usually more efficient than a sequence of single store operations, so one can
exploit this feature by enabling batch operations in write-behind mode. Update sequences of similar types
(put or remove) can be grouped to a single batch. For example, sequential cache puts of
(key1, value1), (key2, value2), (key3, value3) will be batched into
a single storeAll operation. |
Note that GridCacheStore implementation should take into account possible side effects of write-behind if you want to use this feature. For cases described in the first example only last data update is written to the database. In several cases cache updates may be reordered. Both cases may cause referential constraints violation in the persistent storage, so the GridCacheStore implementation should either have these constraints disabled or provide some way to resolve possible conflicts.
Write-behind can be enabled and configured with GridCacheConfigurationAdapterdoc.
12.2. Distributed Data Structures
Did you ever wish you could take a data structure you are familiar with and distribute it over grid? For example, why not take java.util.concurrent.BlockingDeque and add something to it on one node and poll it from another node? Or why not have a distributed primary key generator which would guarantee uniqueness on all nodes? Or how about a distributed java.util.concurrent.atomic.AtomicLong which can be updated and read from any node on the grid? GridGain gives you such capability. What GridGain did is actually take most of the data structures from java.util.concurrent framework and made sure they could be used in distributed fashion.
Currently you can find the following distributed data structures in GridGain:
-
Distributed blocking and non-blocking queues with FIFO, LRU, or Priority policies
-
Distributed atomic sequences (or primary key generators)
-
Distributed AtomicLong
-
Distributed CountDownLatch
12.2.1. Distributed Queues
Distributed queues are realized by GridCacheQueuedoc API. They are created directly from GridCachedoc API and support different modes of operation based on your application requirements. Cache queues implement all methods from java.util.Collection API and support adding and removing elements from either side, head or tail. Additionally you can get the elements from any position within the queue without having to iterate through the queue - as a matter of fact, most of cache queue methods have O(1) complexity.
Here is an example of how a simple unbounded collocated FIFO queue can be created in GridGain.
1 | GridCacheQueue<String> fifoUnboundedCollocatedQueue = grid.cache().queue("myqueue");
|
Collocated vs. Non-Collocated Queues
If you plan to create just a few queues containing lots of data, then you would create a non-collocated queue. This will make sure that about equal portion of each queue will be stored on each grid node. On the other hand, if you plan to have many queues relatively small in size (compared to the whole cache), then you would most likely create collocated queues. In this mode all queue elements will be stored on the same grid node, but about equal amount of queues will be assigned to every node.
Both, collocated and non-collocated modes have their advantages and disadvantages. As you probably already guessed, all elements form a collocated queue are stored on the same grid node, hence the name, so you are bounded by whatever can fit in the memory of a single node (or if you use swap storage, queue elements will be overflown to disk if memory runs out). Collocated queues are usually used with bounded mode as well to make sure there is upper limit on the size, however, it is not a requirement. Iteration over collocated queues is extremely fast as all of it happens on the same node. Also, getting elements at certain positions of the queue or finding a position of a certain element is very fast as well, as all these operations happen locally on the node responsible for caching the queue. Generally all operations on collocated queues have O(1) complexity.
Non-collocated queues on the other hand are distributed across all participating grid nodes and essentially have no memory limitations (or limited to the overall available memory on the whole grid). They can effectively store a lot more data than collocated queues, but at expense of certain operations becoming slower or unsupported altogether. For example iteration over non-collocated queues is a distributed operation and requires querying every participating cache node. Methods like GridCacheQueue.position(T item) or GridCacheQueue.items(Integer… positions) are not supported in non-collocated mode due to poor performance and excessive distribution these operations would require (all other methods on the GridCacheQueue API are supported).
Here is an example of how an unbounded collocated LIFO quueue could be created:
1 2 3 4 5 6 | GridCacheQueue<String> lifoUnboundedCollocatedQueue = grid.cache().queue(
"myqueue", // Queue name.
GridCacheQueueType.LIFO, // Queue type.
0, // Maximum capacity, 0 for unlimited.
true // Collocation flag.
);
|
Bounded Queues
Bounded queues allow to specify maximum size for a queue. Bounded queues can be either collocated or non-collocated. Cache queues have two sets of methods: blocking and non-blocking. If a blocking method is used, like put(T item) method, and bounded queue reaches its maximum size, then all attempts to put additional elements to it will block until an element is taken from the queue. There are also fail-fast methods, like boolean add(T item) which will return false if queue is full.
Bounded queues allow users to have many queues with maximum size which gives a better control over overall cache capacity. As mentioned above, when bounded queues are relatively small and can be used in collocated mode, all queue operations become extremely fast. Moreover, when used in combination with compute grid, users can collocate their compute jobs with grid nodes on which queues are located to make sure that all operations are local and there is none (or minimal) data distribution.
Here is an example of how a job could be send directly to the node on which a queue resides:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | grid.run(GridClosureCallMode.BALANCE, new Runnable() {
// Specifies key used to determine queue affinity - must be the same as queue name.
@GridCacheAffinityMapped
public String queueAffinity() {
return "myqueue";
}
@Override public void run() throws GridException {
GridCacheQueue<String> queue = grid.cache().queue("myqueue");
// Add queue elements (local operation due to collocation).
for (int i = 0; i < 20; i++)
q.add(i);
// Remove queue elements (local operation due to collocation).
for (int i = 0; i < 20; i++) {
Integer item = q.poll();
assert item != null;
}
// Make sure that queue is empty.
assert q.isEmpty();
assert q.poll() == null;
}
});
|
Queue Types
Queue types are specified in GridCacheQueueTypedoc enumeration. GridGain supports three queue types out of the box: FIFO, LIFO, and PRIORITY types.
-
FIFO queue provides for first-in-first-out order of queue elements and generally is the most common way we are used to thinking about the queues. Elements are added at the tail of the queue and are polled from the head of the queue.
-
LIFO queue provides for last-in-first-out order of queue elements and generally work as a stack. Elements are added and retrieved from the tail of the queue.
-
PRIORITY queue orders elements using priority-based order specified by user. Priority of a queue item is specified using @GridCacheQueuePrioritydoc annotation. If priority attribute is not found, then priority of 0 is assigned by default. Here is an example of how priority queue can be created and used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | public void priorityQueueExample {
Random rand = new Random();
Grid grid = G.grid();
// Initialize new unbounded collocated priority queue.
GridCacheQueue<PriorityItem> queue = grid.cache().queue("myqueue", PRIORITY);
// Store 20 elements in queue with random priority.
for (int i = 0; i < 20; i++) {
int priority = rand.nextInt(20);
queue.put(new PriorityItem(priority, "somedata-" + i));
}
PriorityItem item = null;
int lastPriority = 0;
do {
item = queue.poll();
// Ensure the elements are correctly ordered based on priority.
assert lastPriority <= item.priority();
lastPriority = item.priority();
}
while (item != null);
}
...
// Class defining sample queue element with its priority specified via
// @GridCacheQueuePriority annotation attached to priority field.
private static class PriorityItem implements Serializable {
// Priority of queue item.
@GridCacheQueuePriority
private final int priority;
private final String data;
private SampleItem(int priority, String data) {
this.priority = priority;
this.data = data;
}
public int priority() {
return priority;
}
}
|
Cache Queues and Load Balancing
Given that elements will remain in the queue until someone takes them, and that no two nodes should ever receive the same element from the queue, cache queues can be used as an alternate work distribution and load balancing approach within GridGain. For example, you could simply put computations, such as instances of Runnable or GridAbsClosuredoc, into the queue and have threads on remote nodes call GridCacheQueue.take()doc method which will block if queue is empty. Once the take() method returns a job, a thread will process it and call take() again to get the next job. Given this approach, threads on remote nodes will only start working on the next job when they have completed the previous one, hence creating ideally balanced system where every node only takes the number of jobs it can process, and not more.
12.2.2. Distributed Sequences
12.2.3. Distributed AtomicLong
12.2.4. Distributed CountDownLatch
12.3. Cache Configuration
12.3.1. Overview
GridCacheConfigurationdoc interface defines cache runtime configuration. This configuration is passed to GridConfigurationAdapter.setCacheConfiguration(GridCacheConfiguration…)doc method. It defines all configuration parameters required to start a cache instance.
Note, that absolutely every configuration property in GridCacheConfigurationdoc is optional.
The following configuration parameters can be used to configure cache with GridCacheConfigurationAdapterdoc:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setName(String)doc |
Cache name. |
Yes |
null |
setCacheMode(GridCacheMode)doc |
Caching mode. |
Yes |
REPLICATEDdoc |
setStartSize(int)doc |
Initial size for internal hash map. |
Yes |
1024 |
setDefaultLockTimeout(long)doc |
Default lock timeout in milliseconds. |
Yes |
0 |
setDefaultTimeToLive(long)doc |
Time to live for all objects in cache. |
Yes |
0 |
setDefaultTxConcurrency(GridCacheTxConcurrency)doc |
Default transaction concurrency. |
Yes |
OPTIMISTICdoc |
setDefaultTxIsolation(GridCacheTxIsolation)doc |
Default transaction isolation. |
Yes |
REPEATABLE_READdoc |
setDefaultTxTimeout(long)doc |
Default transaction timeout in milliseconds. |
Yes |
0 |
setAffinity(GridCacheAffinity)doc |
Affinity for cache keys. |
Yes |
null |
setAffinityMapper(GridCacheAffinityMapper)doc |
Custom affinity mapper. |
Yes |
GridCacheDefaultAffinityMapperdoc |
setEvictionPolicy(GridCacheEvictionPolicy)doc |
Cache eviction policy. |
Yes |
GridCacheLirsEvictionPolicydoc |
setEvictionKeyBufferSize(int)doc |
Eviction key buffer size. |
Yes |
10240 |
setEvictSynchronized(boolean)doc |
Flag indicating whether entries should be evicted from both, primary and backup nodes for partitioned caches, or the rest of the nodes for replicated caches. |
Yes |
false |
setEvictNearSynchronized(boolean)doc |
Flag indicating whether entries should be evicted from near caches when they are evicted on primary nodes. |
Yes |
true |
setMaxEvictionOverflowRatio(float)doc |
Maximum eviction overflow ratio. |
Yes |
10 |
setDgcFrequency(int)doc |
Frequency in milliseconds for internal distributed garbage collector. Pass 0 to disable. |
Yes |
10000ms |
setDgcRemoveLocks(boolean)doc |
Flag indicating whether DGC should clear obsolete flags or not. |
Yes |
true |
setDgcSuspectLockTimeout(int)doc |
Suspect lock timeout in milliseconds for internal distributed garbage collector. |
Yes |
10000ms |
setIndexAnalyzeFrequency(long)doc |
Frequency of running H2 "ANALYZE" command. |
Yes |
600000ms |
setIndexAnalyzeSampleSize(long)doc |
Number of samples used to run H2 "ANALYZE" command. |
Yes |
10000 |
setIndexCleanup(boolean)doc |
Flag indicating whether indexes should be deleted on system shutdown or startup. |
Yes |
true |
setIndexFixedTyping(boolean)doc |
Fixed typing flag. |
Yes |
true |
setIndexFullClassName(boolean)doc |
Flag indicating weather full or simple class names should be used for querying. |
Yes |
false |
setIndexH2Options(String)doc |
Any additional options for the underlying H2 database used for querying. |
Yes |
null |
setIndexMaxOperationMemory(int)doc |
Maximum memory used for delete and insert in bytes. 0 means no limit. |
Yes |
100000 |
setIndexMemoryOnly(boolean)doc |
Flag indicating whether query indexes should be kept only in memory or offloaded on disk as well. |
Yes |
false |
setIndexPath(String)doc |
File path (absolute or relative to GRIDGAIN_HOME) to store cache indexes. |
Yes |
GRIDGAIN_HOME/work/cache/indexes |
setIndexUsername(String)doc |
Username to login to index database. |
Yes |
null |
setIndexPassword(String)doc |
Password to login to index database. |
Yes |
null |
setStore(GridCacheStore)doc |
Persistent storage for cache data. |
Yes |
null |
setStoreEnabled(boolean)doc |
Flag indicating whether store is enabled. |
Yes |
true |
setStoreValueBytes(boolean)doc |
Flag indicating if cached values should be additionally stored in serialized form. |
Yes |
true |
setWriteFromBehindEnabled(boolean)doc |
Flag indicating whether write-behind store is enabled |
Yes |
false |
setWriteFromBehindBatchSize(int)doc |
Maximum size of batch update in write-behind mode |
Yes |
512 |
setWriteFromBehindFlushFrequency(long)doc |
Frequency of write queue flush events, in milliseconds (0 to disable time-based flush) |
Yes |
5000 |
setWriteFromBehindFlushSize(int)doc |
Size of write queue at which flush event will be triggered |
Yes |
10240 |
setWriteFromBehindFlushThreadCount(int)doc |
Number of threads that will flush the write queue when flush event is triggered |
Yes |
1 |
setPreloadMode(GridCachePreloadMode)doc |
Cache preload mode. |
Yes |
ASYNCdoc |
setPreloadBatchSize(int)doc |
Preload batch size. |
Yes |
102400 |
setPreloadThreadPoolSize(int)doc |
Size of preloading thread pool. |
Yes |
2 |
setNearEnabled(boolean)doc |
Flag indicating whether near cache is enabled in case of PARTITIONED mode. |
Yes |
true |
setNearEvictionPolicy(GridCacheEvictionPolicy)doc |
Eviction policy for near cache. |
Yes |
GridCacheLirsEvictionPolicydoc |
setNearStartSize(int)doc |
Start size for near cache. |
Yes |
256 |
setAtomicSequenceReserveSize(int)doc |
Default number of sequence values reserved for GridCacheAtomicSequence instances. |
Yes |
1000 |
setAutoIndexQueryTypes(Collection<GridCacheQueryType>)doc |
Query types to use to auto index values of primitive types. |
Yes |
null |
setBatchUpdateOnCommit(boolean)doc |
Flag indicating if persistent store should be updated after every cache operation or once at commit time. |
Yes |
true |
setCloner(GridCacheCloner)doc |
Сloner to be used if CLONE flag is set on projection. |
Yes |
null |
setInvalidate(boolean)doc |
Invalidation flag for this transaction. |
Yes |
false |
setRefreshAheadRatio(double)doc |
Refresh-ahead ratio for cache entries. Values other than zero specify how soon entries will be auto-reloaded from persistent store prior to expiration. |
Yes |
0 |
setSwapEnabled(boolean)doc |
Flag indicating whether swap storage ise enabled or not. |
Yes |
false |
setSynchronousCommit(boolean)doc |
Flag indicating whether nodes on which user transaction completed should wait for the same transaction on remote nodes to complete. |
Yes |
false |
setSynchronousRollback(boolean)doc |
Flag indicating whether nodes on which user transaction was rolled back should wait for the same transaction on remote nodes to complete. |
Yes |
false |
setTransactionManagerLookup(GridCacheTmLookup)doc |
Look up mechanism for available TransactionManager implementation, if any. |
Yes |
null |
Some of the most commonly used configuration properties are explained in more detail below.
Cache Mode
Following cache modes are supported:
-
LOCALdoc - specifies local-only cache behaviour. In this mode caches residing on different grid nodes will not know about each other.
-
REPLICATEDdoc - specifies fully replicated cache behavior. In this mode all the keys are distributed to all participating nodes. User still has affinity control over subset of nodes for any given key via GridCacheAffinitydoc configuration.
-
PARTITIONEDdoc - specifies partitioned cache behaviour. In this mode the overall key set will be divided into partitions and all partitions will be split equally between participating nodes. User has affinity control over key assignment via GridCacheAffinitydoc configuration.
Affinity
Cache key affinity maps keys to nodes. GridCacheAffinitydoc interface is utilized for both replicated and partitioned caches.
Whenever a key is given to cache, it is first passed to a pluggable GridCacheAffinityMapperdoc which may potentially map this key to an alternate key which should be used for affinity. The key returned from affinityKey(Object)doc method is then passed to partition(Object)doc method to find out the partition for the key. Then this partition together with all participating nodes are passed to nodes(int, Collection)doc method which returns a collection of nodes. This collection of nodes is used for node affinity. In REPLICATED cache mode the key will be cached on all returned nodes; generally, all caching nodes participate in caching every key in replicated mode. In PARTITIONED mode, only primary and backup nodes are returned with primary node always in the first position. So if there is 1 backup node, then the returned collection will have 2 nodes in it - primary node in first position, and backup node in second.
12.3.2. Examples
GridCacheConfiguration may be defined in code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
GridCacheConfigurationAdapter cacheCfg = new GridCacheConfigurationAdapter();
cacheCfg.setName("mycache");
cacheCfg.setCacheMode(GridCacheMode.LOCAL);
cfg.setCacheConfiguration(cacheCfg);
...
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 | <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="cacheConfiguration">
<list>
<bean class="org.gridgain.grid.cache.GridCacheConfigurationAdapter">
<property name="name" value="mycache"/>
<property name="cacheMode" value="LOCAL"/>
</bean>
</list>
</property>
...
</bean>
|
13. GridGain Scalar - Scala DSL
TODO
14. GridGain Grover - Groovy++ DSL
TODO
15. REST APIs
GridGain REST api supports external connectivity to GridGain via REST. It comes in handy whenever GridGain Java API is not available directly, but it is still needed to execute GridGain tasks or retrieve cached data. For example, you can conveniently use GridGain REST API from other non-JVM languages, such as Ruby, or Python or any other language, whenever local instance of GridGain is not available.
15.1. Overview
Currently there are two ways to utilize GridGain REST API:
-
over HTTP protocol.
-
over Memcache binary protocol.
|
|
Memcache binary protocol will become available in 3.5.1 release. |
15.1.1. HTTP protocol
All REST HTTP commands have the following format:
http://localhost:8080/gridgain?cmd=exe&...
where cmd is the name of the command followed by other command parameters. Every command may have different parameters, some of which may be mandatory and some optional. The commands parameters may be passed either via HTTP GET or POST, whichever one is preferred.
All commands return response in JSON format with following fields (note that some commands may return additional fields as well):
success |
Boolean flag to indicate whether or not command completed successfully |
response |
Command response serialized as JSON - it is a requirement that responses comply with Java Bean standard (i.e. have getters and setters for fields) |
error |
Description of error associated with failed command execution. It is only provided if success flag is false |
15.1.2. Memcache binary protocol
GridGain implements Memcache binary protocol. This allows to execute most of cache commands using one of available Memcache clients. Note that the client you choose must support binary protocol.
|
|
Memcache binary protocol will become available in 3.5.1 release. |
15.2. Cache Commands
All cache commands in GridGain have one additional field in responses - affinityNodeId which tells the node ID of the primary node responsible for caching requested data. Users can use this ID to send future requests for the same data to the primary affinity node for better performance. Otherwise, whenever a request for data arrive on some node, that node will have to figure out the primary affinity node responsible for caching requested data and then send the request there. This will involve an extra network round trip which could have been avoided if the request came to the primary node directly.
The following list of commands are available to access GridGain cache.
GET command is used get a value stored in cache. It is analogous to invoking GridCacheProjection.get(someKey) doc method. GET command supports the following parameters:
cmd |
get |
key |
Mandatory parameter to specify the cache key for the value to be retrieved. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=get&cacheName=mycache&key=mykey
Example Response
{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":"some-value","success":true}
GET_ALL command is used get several values stored in cache. It is analogous to invoking GridCacheProjection.getAll(keys) doc method. GET_ALL command supports the following parameters:
cmd |
getall |
k1…kN |
Keys for the values to be retrieved. At least one must be specified. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=getall&cacheName=mycache&k1=mykey1&k2=mykey2
Example Response
{"affinityNodeId":"","error":"","response":{"mykey2":"myval2","mykey1":"myval1"},"success":true}
|
|
GET_ALL command will become available in 3.5.1 release. |
PUT command is used to store a value in cache. It is analogous to invoking GridCacheProjection.putx(someKey, someValue) doc method. PUT command supports the following parameters:
cmd |
put |
key |
Mandatory parameter to specify the cache key for the value to be stored. |
val |
Mandatory parameter to specify the value to cache, cannot be null or empty. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=put&cacheName=mycache&key=mykey&val=myval
Example Response
{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":true,"success":true}
PUT_ALL command is used to store several values in cache. It is analogous to invoking GridCacheProjection.putAll(map) doc method. PUT_ALL command supports the following parameters:
cmd |
putall |
k1…kN |
Keys for the values to be stored. At least one must be specified. |
v1…vN |
Values to be stored cache, cannot be null or empty. Numver of values must be equal to number of keys. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=putall&cacheName=mycache&k1=mykey1&v1=myval1&k2=mykey2&v2=myval2
Example Response
{"affinityNodeId":"","error":"","response":true,"success":true}
|
|
PUT_ALL command will become available in 3.5.1 release. |
REMOVE command is used remove a mapping stored in cache. It is analogous to invoking GridCacheProjection.removex(someKey) doc method. REMOVE command supports the following parameters:
cmd |
rmv |
key |
Mandatory parameter to specify the cache key for the value to be removed. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=rmv&cacheName=mycache&key=mykey
Example Response
{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":true,"success":true}
REMOVE_ALL command is used remove a mapping stored in cache. It is analogous to invoking GridCacheProjection.removeAll(keys) doc method. REMOVE_ALL command supports the following parameters:
cmd |
rmvall |
k1…kn |
Keys for the values to be removed. At least one must be specified. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=rmvall&cacheName=mycache&k1=mykey1&k2=mykey2
Example Response
{"affinityNodeId":"","error":"","response":true,"success":true}
|
|
REMOVE_ALL command will become available in 3.5.1 release. |
REPLACE command is used to replace a value in cache only if there is already some existing mapping for the specified key. It is analogous to invoking GridCacheProjection.replacex(someKey, someValue) doc method. REPLACE command supports the following parameters:
cmd |
rep |
key |
Mandatory parameter to specify the cache key for the value to be replaced. |
val |
Mandatory parameter to specify the value to cache, cannot be null or empty. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://1localhost:8080/gridgain?cmd=rep&cacheName=mycache&key=mykey&val=myval
Example Response
{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":false,"success":true}
CAS command stands for compare-and-set and is used to replace a value in cache only if it matches the provided value. Based on the values passed in, it has different behavior.
-
If both, val1 and val2 are null or empty, then this command is analogous to REMOVE command.
-
If val1 is not null or empty, but val2 is, then this command will store a value in cache only if there no existing mapping for the provided key. It is analogous to invoking GridCacheProjection.putxIfAbsent(someKey, someValue) doc method.
-
If val1 is null or empty, but val2 is not, then this command will remove a mapping for provided key only if current value is equal to val2. It is analogous to invoking GridCacheProjection.remove(someKey, someValue) doc method.
-
If both, val1 and val2 are not null or empty, then this command will replace a mapping for provided key only if current value is equal to val2. It is analogous to invoking GridCacheProjection.replace(someKey, oldValue, newValue) doc method.
CAS command supports the following parameters:
cmd |
cas |
key |
Mandatory parameter to specify the cache key for the value to be set. |
val1 |
Existing value stored in cache used for compare operation. |
val2 |
New value to store in cache only if old value is equal to val1. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=cas&cacheName=mycache&key=mykey&val1=oldVal&val2=newVal
Example Response
{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":false,"success":true}
AFFINITY command is used to retrieve primary affinity node responsible for storing cache key. It is analogous to invoking GridCacheProjection.mapKeyToNode(someKey) doc method. AFFINITY command supports the following parameters:
cmd |
aff |
key |
Mandatory parameter to specify the cache key to get affinity node ID for. |
cacheName |
Optional cache name, if omitted, default cache will be used. |
Example Request
http://localhost:8080/gridgain?cmd=aff&cacheName=mycache&key=mykey
Example Response
{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":true,"success":true}
METRICS command is used to retrieve cache metrics or cache entry metrics.
METRICS command supports the following parameters:
cmd |
cache |
key |
Optional parameter to specify the cache entry to get metrics for. If omitted, cache metrics will be returned. |
cacheName |
Optional cache name. if omitted, default cache metrics will be returned. |
Example Request
http://localhost:8080/gridgain?cmd=cache&cacheName=mycache
Example Response
{"affinityNodeId":"","error":"","response":{"createTime":1298362596532,"hits":1,"misses":1,
"readTime":1298363347487,"reads":2,"writeTime":1298362597375,"writes":7},"success":true}
INCREMENT command is used to increment integer value stored in cache. It supports the following parameters:
cmd |
incr |
key |
Mandatory parameter to specify the cache key for the value to be incremented. |
init |
Parameter to specify initial (default) value. It will be set if value for provided key is not in cache. |
delta |
Parameter to specify value that will be added to value stored in cache. |
cacheName |
Optional cache name. if omitted, default cache metrics will be returned. |
Example Request
http://localhost:8080/gridgain?cmd=incr&cacheName=mycache&key=key&init=0&delta=3
Example response
{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":3,"success":true}
|
|
INCREMENT command will become available in 3.5.1 release. |
DECREMENT command is used to decrement integer value stored in cache. It supports the following parameters:
cmd |
decr |
key |
Mandatory parameter to specify the cache key for the value to be incremented. |
init |
Parameter to specify initial (default) value. It will be set if value for provided key is not in cache. |
delta |
Parameter to specify value that will be subtracted from value stored in cache. |
cacheName |
Optional cache name. if omitted, default cache metrics will be returned. |
Example Request
http://localhost:8080/gridgain?cmd=decr&cacheName=mycache&key=key&init=0&delta=3
Example response
{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":-3,"success":true}
|
|
DECREMENT command will become available in 3.5.1 release. |
APPEND command is used to append string value stored in cache with provided string. It supports the following parameters:
cmd |
append |
key |
Mandatory parameter to specify the cache key for the value to be updated. |
val |
Parameter to specify string that will be appended to stored value. |
cacheName |
Optional cache name. if omitted, default cache metrics will be returned. |
Example Request
http://localhost:8080/gridgain?cmd=append&cacheName=mycache&key=key&val=_suffix
Example response
{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":true,"success":true}
|
|
APPEND command will become available in 3.5.1 release. |
PREPEND command is used to prepend string value stored in cache with provided string. It supports the following parameters:
cmd |
prepend |
key |
Mandatory parameter to specify the cache key for the value to be updated. |
val |
Parameter to specify string that will be prepended to stored value. |
cacheName |
Optional cache name. if omitted, default cache metrics will be returned. |
Example Request
http://localhost:8080/gridgain?cmd=prepend&cacheName=mycache&key=key&val=prefix_
Example response
{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":true,"success":true}
|
|
PREPEND command will become available in 3.5.1 release. |
15.3. Topology Commands
Topology commands are used to retrieved various grid topology information from GridGain. The following commands are available to access GridGain topology:
TOPOLOGY command is used to retrieve list of available GridGain nodes in grid topology.
TOPOLOGY command supports the following parameters:
cmd |
top |
mtr |
true or false. Optional parameter to specify whether nodes metrics should be included to response or not. If omitted, metrics will not be included. |
attr |
true or false. Optional parameter to specify whether nodes attributes should be included to response or not. If omitted, attributes will not be included. |
Example Request
http://localhost:8080/gridgain?cmd=top&mtr=false&attr=false
Example Response
{"error":"","response":[{"attributes":null,"externalAddresses":[],"internalAddresses":["localhost"],
"metrics":null,"nodeId":"4ffa1248-0d4f-4e4a-bf79-e8b586b0dc31"}],"success":true}
NODE command is used to retrieve information about a single GridGain node based on either node ID or any of node’s available IP addresses.
NODE command supports the following parameters:
cmd |
node |
id |
ID of the node to retrieve information about. If omitted, ip should be provided. If id and ip are provided, both are used. |
ip |
IP (external or internal) of the node to retrieve information about. If omitted, id should be provided. If id and ip are provided, both are used. Note: if multiple nodes have the same IP, then there are no guarantees on what node is returned. |
mtr |
true or false. Optional parameter to specify whether nodes metrics should be included to response or not. If omitted, metrics will not be included. |
attr |
true or false. Optional parameter to specify whether nodes attributes should be included to response or not. If omitted, attributes will not be included. |
Example Request
http://localhost:8080/gridgain?cmd=node&ip=1.2.3.4&id=4ffa1248-0d4f-4e4a-bf79-e8b586b0dc31
Example Response
{"error":"","response":{"attributes":null,"externalAddresses":[],"internalAddresses":["1.2.3.4"],
"metrics":null,"nodeId":"4ffa1248-0d4f-4e4a-bf79-e8b586b0dc31"},"success":true}
15.4. Task Execution Commands
Task execution commands provide a way to execute GridGain tasks over HTTP.
Task execution commands respond with the special entity having the following fields:
error |
Description of the error occurred while task execution. Do not mess with error of the response. |
finished |
Boolean flag indicating whether task execution is finished or not. |
id |
ID of the task to query results in case of asynchronous execution. |
result |
Task execution result serialized as JSON. |
The following commands are available for GridGain task execution:
EXE command is used to execute GridGain task remotely with specified parameters and returns task execution result back (or task ID to query results in case of asynchronous execution).
EXE command supports the following parameters:
cmd |
exe |
name |
Mandatory parameter. Task name or task class name. |
timeout |
Optional parameter. Task execution timeout in milliseconds. If not provided or equals to 0 the system will wait indefinitely for execution completion. If provided should be greater or equal to 0. |
p1,..,pN |
Optional task parameters. Any number of parameters is possible. If only parameter is provided it is passed as is, if two or more are provided, then they are passed as array. |
async |
true or false. Optional sync/async execution flag. If omitted, task will be executed synchronously. If value is true then result may be queried further via RESULT command (task ID will be returned in response). |
Example Request
http://localhost:8080/gridgain?cmd=exe&name=org.gridgain.grid.kernal.processors.rest.TestTask2
Example Response
{"error":"","response":{"error":"","finished":true,"id":"0e731fb3-77ba-4932-b625-a2e197bc444c~16cd1450-fa4f-4bb0-8029-6048b905a5dc",
"result":"Task 2 result."},"success":true}
Example Request
http://localhost:8080/gridgain?cmd=exe&name=org.gridgain.grid.kernal.processors.rest.TestTask2&timeout=1&async=true
Example Response
{"error":"","response":{"error":"","finished":false,"id":"7b3d682e-759c-4310-aaa2-ddfba54fb0b8~16cd1450-fa4f-4bb0-8029-6048b905a5dc",
"result":null},"success":true}
RESULT command is used to retrieve results of GridGain task execution (initiated by EXE command).
RESULT command supports the following parameters:
cmd |
res |
id |
Mandatory parameter. ID of the task (returned in response to EXE command). |
Example Request
http://localhost:8080/gridgain?cmd=res&id=80ae2a49-029e-439a-bee5-8bad67381173~4186cc96-0f62-45dc-976f-979bfea08a90
Example Response
{"error":"","response":{"error":"","finished":true,"id":"80ae2a49-029e-439a-bee5-8bad67381173~4186cc96-0f62-45dc-976f-979bfea08a90",
"result":"Task 2 result."},"success":true}
15.5. REST Authentication
To control access to the REST API, you may require authentication by providing REST secret key (GridConfiguration.getRestSecretKey() doc).
If secret key is provided, then all requests should contain authentication token.
For REST over http(s) token is sent via X-Signature header. .
Token is built using the following algorithm:
-
Client makes up a string out of timestamp (in milliseconds) and secret key separated by semicolon - timestamp:secretKey;
-
Client calculates SHA-1 hash of the string;
-
Finally makes up a token out of timestamp value and BASE64 encoded hash calculated during the previous step - timstamp:hash_base64.
Protocol implementations split token to fetch timestamp and make the same operations, then compare the result hash with provided one. If results are equal request is authenticated.
|
|
For more security it is recommended to access REST API via https instead of http. |
Example
secretKey |
secret-key |
timestamp |
1298966938803 |
hash of 1298966938803:secret-key (base64) |
emcRg3ZcVuce4AwDGXn4e4n2kqA= |
X-Signature |
1298966938803:emcRg3ZcVuce4AwDGXn4e4n2kqA= |
16. Grid Enabling JUnits
16.1. Overview
Ability to distribute JUnit tests allows you to get test results from your build server 2, 3, or 5 times faster depending on amount of nodes you allocate to run your tests. You can also run your distributed JUnits directly from IDE and all IDE native JUnit integration semantics will be preserved.
Distributed Junit support is added starting GridGain 1.6.0 release. In a nutshell it simply takes your regular JUnit TestSuite and runs it in parallel on remote nodes. Even if you don’t have remote nodes, the tests within your TestSuite will run in parallel on local node. Both, individual tests and test suites are supported. If you have nested test suites inside of your distributed test suite, then the whole suite will executed in parallel on remote node (note, that tests within a nested suite will still execute sequentially).
GridGain distributed JUnit support gives you the following benefits:
| Feature | Description |
|---|---|
Minimal to Zero Code Change |
Simply switch to using GridJunit3TestSuite doc or attach @GridifyTestdoc annotation to your existing static suite() method or Junit4 suite and you are good to go. |
Peer Class Loading |
You don’t need to explicitly deploy your tests or your code on every grid node, the deployment happens automatically. |
Nested Test Grouping |
With GridGain you have full control over how tests are grouped for parallel or remote execution by combining tests within nested test suites. |
Customizable Test Routing |
With GridGain you have full control over how every test gets routed to remote node for execution by providing your own GridTestRouterdoc implementation. By default GridTestRouterAdapterdoc is used which routes tests in round-robin fashion between nodes. |
Local Test Suites |
Support for tests that can only be executed locally (usually due to environment issues), but still can benefit from parallel execution. |
Configurable Test Scheduling |
With GridGain you can configure how many tests can run in parallel on local or remote nodes via parallelJobsNumber configuration parameter on GridCollisionSpi doc SPI. |
Native IDE integration |
You can run your JUnit tests directly from any IDE, be that IDEA, Eclipse, NetBeans, etc… and your distributed tests will execute as if it was a local execution - all logging and failures will be preserved. |
16.2. Supported Implementations
You can distribute your JUnits in 2 different way. One way is to use distributed test suites for distributed JUnit3 and distributed JUnit4 directly, and another is to use @GridifyTest doc annotation with AOP.
16.2.1. Distributed JUnit3
Distributed JUnit support has been added starting GridGain 1.6.0 release.
You can distribute your JUnit3 test suites in 2 different ways. One way is to use GridJunit3TestSuite doc directly instead of the usual JUnit3 TestSuite. This is perhaps the easiest and most straight forward way to grid-enable your JUnit3 test suites.
Another way is by attaching @GridifyTestdoc annotation to your static suite() methods on JUnit3 test suites. The advantage of this approach is that you can provide test configuration parameters, such as timeout or custom test router right as annotation parameter in code, without having to change your existing test suites or having to pass extra VM arguments.
GridJunit3TestSuitedoc is the test suite that handles distributing JUnit tests automatically. Simply add tests to this suite just like you would for regular JUnit3 suites, and these tests will be executed in parallel on the grid. Note that if there are no other grid nodes, this suite will still ensure parallel test execution within single JVM.
Bellow is an example of distributed JUnit3 test suite:
1 2 3 4 5 6 7 8 9 10 11 12 13 | public class GridJunit3ExampleTestSuite {
// Standard JUnit3 static suite method.
public static TestSuite suite() {
TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");
// Add tests.
suite.addTestSuite(TestA.class);
suite.addTestSuite(TestB.class);
suite.addTestSuite(TestC.class);
return suite;
}
}
|
If you have four tests A, B, C, and D, and if you need to run A and B sequentially, then you should create a nested test suite with test A and B as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | public class GridJunit3ExampleTestSuite {
// Standard JUnit3 static suite method.
public static TestSuite suite() {
TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");
// Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add tests A and B.
suite.addTest(nested);
// Add other tests.
suite.addTestSuite(TestC.class);
return suite;
}
}
|
GridJunit3LocalTestSuite
Some tests can only be executed locally mostly due to some environment issues. However they still can benefit from parallel execution with other tests. GridGain supports it via GridJunit3LocalTestSuite doc suites that can be nested within GridJunit3TestSuite doc test suite.
To use local test suite within distributed test suite, simply add it to distributed test suite as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | public class GridJunit3ExampleTestSuite {
// Local test suite example.
public static TestSuite suite() {
TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");
// Local nested test suite to always run tests A and B
// on the local node.
TestSuite nested = new GridJunit3LocalTestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add local tests A and B.
suite.addTest(nested);
// Add other tests.
suite.addTestSuite(TestC.class);
suite.addTestSuite(TestD.class);
return suite;
}
}
|
Logging
When running distributed JUnit, all the logging that is done to System.out or System.err is preserved. GridGain will accumulate all logging that is done on remote nodes, send them back to originating node and associate all log statements with their corresponding tests. This way, for example, if you are running tests from and IDEA or Eclipse (or any other IDE) you would still see the logs as if it was a local run. However, since remote nodes keep all log statements done within a single individual test case in memory, you must make sure that enough memory is allocated on every node and that individual test cases do not spit out GigaBytes of log statements. Also note, that logs will be sent back to originating node upon completion of every test, so don’t be alarmed if you don’t see any log statements for a while and then all of them appear at once.
GridGain achieves such log transparency via reassigning System.out or System.err to internal PrintStream implementation. However, when using Log4J (or any other logging framework) within your tests you must make sure that it is configured with ConsoleAppender and that ConsoleAppender.setFollow(boolean) attribute is set to true. Logging to files is not supported yet and is planned for future releases.
Test Nesting
GridJunit3TestSuitedoc instances can be nested within each other as deep as needed. However all nested distributed test suites will be treated just like regular JUnit test suites and not as distributed test suites. This approach becomes convenient when you have several distributed test suites that you would like to be able to execute separately in distributed fashion, but at the same time you would like to be able to execute them as a part of larger distributed suites.
To enable JUnit3 tests using @GridifyTestdoc annotation, simply attach this annotation to static suite() method for a test suite you would like to grid-enable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | public class GridifyJunit3ExampleTestSuite {
// Standard JUnit3 suite method. Note we attach @GridifyTest
// annotation to it, so it will be grid-enabled.
@GridifyTest
public static TestSuite suite() {
TestSuite suite = new TestSuite("Example Test Suite");
// Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add tests A and B.
suite.addTest(nested);
// Add other tests.
suite.addTestSuite(TestC.class);
suite.addTestSuite(TestD.class);
return suite;
}
}
|
To run distributed JUnit tests you need to start other instances of GridGain. You can do so by running GRIDGAIN_HOME/bin/ggjunit.{sh|bat} script, which will start default configuration. If configuration other than default is required, then use regular GRIDGAIN_HOME/bin/ggstart.{sh|bat} script and pass your own Spring configuration file as a parameter to the script.
You can use the following configuration parameters to configure distributed test suite locally. Note that many parameters can be overridden by setting corresponding JVM parameters defined in GridTestVmParametersdoc at VM startup.
| Configuration Method | Default Value | Description |
|---|---|---|
setDisabled(boolean)doc |
false |
If true then GridGain will be turned off and suite will run locally. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_DISABLEDdoc JVM parameter to true. This parameter comes handy when you would like to turn off GridGain without changing the actual code. |
setConfigurationPath(String)doc |
config/junit/junit-spring.xml |
Optional path to GridGain Spring configuration file for running JUnit tests. This property can be overridden by setting GridTestVmParameters.GRIDGAIN_CONFIG doc VM parameter. Note that the value can be either absolute value or relative to GRIDGAIN_HOME installation folder. |
setRouterClassName(String)doc |
GridTestRouterAdapterdoc class name. |
Optional name of test router class that implements GridTestRouterdoc interface. If not provided, then tests will be routed in round-robin fashion using default GridTestRouterAdapter doc. The value of this parameter can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_ROUTERdoc VM parameter to the name of your own custom router class. |
GRIDGAIN_ROUTER_PREFER_REMOTEdoc |
false |
This value can only be set as VM parameter. Set it to true, e.g. -DGRIDGAIN_ROUTER_PREFER_REMOTE=true, if you would like test router to not route tests to local node if there are remote nodes present. Note that this property works only with default test router. |
setRouterClass(Class)doc |
null |
Same as setRouterClassName(String)doc, but sets the actual class instead of the name. |
setTimeout(long)doc |
0 which means that tests will never timeout. |
Maximum timeout value in milliseconds after which test suite will return without waiting for the remaining tests to complete. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_TIMEOUTdoc JVM parameter to the timeout value for the tests. |
Test Scheduling
With GridGain you can configure how many tests you can run in parallel by specifying parallelJobsNumber configuration parameter on GridCollisionSpidoc. Simply uncomment the following section in GRIDGAIN_HOME/config/junit/junit-spring.xml file:
1 2 3 4 5 | <property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.fifoqueue.GridFifoQueueCollisionSpi">
<property name="parallelJobsNumber" value="1"/>
</bean>
</property>
|
The XML configuration above will guarantee that only 1 test can run at a time on local or remote nodes. You can ensure this way that although your tests run in parallel on different nodes, within a single node only one test can be running and all other ones are waiting.
To start a remote node for JUnit tests, open the terminal window on Linux/Mac OS X or Command Prompt on Windows, change directory to GRIDGAIN_HOME/bin and run the ggstart.{sh|bat} script. However, distributed JUnits have to use GridTestExecutorServicedoc which is pre-configured in GRIDGAIN_HOME/config/junit/junit-spring.xml Spring configuration file. You need to specify a path to this file to the gridgain startup script as follows:
ggstart.bat config/junit/junit-spring.xml
or starting from GridGain 1.6.1, simply execute ggjunit.{sh|bat} script:
ggjunit.bat
It takes 2-3 seconds for grid node to start and if everything worked fine you should see starting log ending with successful start acknowledgment.
This example will demonstrate how GridGain can distribute your long running JUnit3 tests or test suites across grid and hence dramatically speeding up overall execution of all tests.
To try this example you will need to open GridJunit3ExampleTestSuite.java in IDEA, Eclipse or any other IDE and run this JUnit3 suite using standard IDE JUnit integration. You will observe how execution of the tests is offloaded to remote nodes and then the results are seen in the IDE just as if it was a local run.
To run this example you need to start one or more additional grid nodes. For simplicity, you can start these nodes on the same box on which you are running the example.
Create GridJunit3TestSuite Suite
The only difference from standard JUnit3 suites is that instead of creating a new TestSuite we create a new GridJunit3TestSuitedoc suite.
1 | TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");
|
Running Tests Sequentially
Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence.
1 2 3 4 5 6 7 8 | // Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add tests A and B.
suite.addTest(nested);
|
Running Tests Locally
Certain tests must run locally no matter what, often due to some environmental issues. Yet these tests can benefit from parallel execution with other tests. GridGain supports it via GridJunit3LocalTestSuite doc suite. For example, the code below guarantees that TestC will always run locally.
1 2 3 | // Add TestC to execute always on the local node but still in
// parallel with other tests.
suite.addTest(new GridJunit3LocalTestSuite(TestC.class, "Local suite"));
|
Full Source Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | public final class GridJunit3ExampleTestSuite {
/**
* Standard JUnit3 static suite method.
*
* @return JUnit3 suite.
*/
public static TestSuite suite() {
TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");
// Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add tests A and B.
suite.addTest(nested);
// Add TestC to execute always on the local node but still in
// parallel with other tests.
suite.addTest(new GridJunit3LocalTestSuite(TestC.class, "Local suite"));
// Add other tests.
suite.addTestSuite(TestD.class);
return suite;
}
}
|
This example will demonstrate how GridGain can distribute your long running JUnit3 tests or test suites across grid using @GridifyTestdoc annotation.
To try this example you will need to open GridifyJunit3ExampleTestSuite in IDEA, Eclipse or any other IDE and run this JUnit3 suite using standard IDE JUnit integration. You will observe how execution of the tests is offloaded to remote nodes and then the results are seen in the IDE just as if it was a local run. All you had to do is attach @GridifyTestdoc annotation to your standard static suite() method for JUnit3 test suites.
Configuration
In order to enable @GridifyTestdoc you must enable either AspectJ or JBoss AOP.
JBoss AOP
Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:
-
The following JVM configuration must be present:
-
-javaagent:[path to jboss-aop-jdk50-4.x.x.jar]
-
-Djboss.aop.class.path=[path to gridgain.jar]
-
-Djboss.aop.exclude=org,com -Djboss.aop.include=org.gridgain.examples
-
-
The following JARs should be in a classpath:
-
javassist-4.x.x.jar
-
jboss-aop-jdk50-4.x.x.jar
-
jboss-aspect-library-jdk50-4.x.x.jar
-
jboss-common-4.x.x.jar
-
trove-1.0.x.jar
-
AspectJ AOP
The following configuration needs to be applied to enable AspectJ byte code weaving.
-
JVM configuration should include: -javaagent:GRIDGAIN_HOME/libs/aspectjweaver-1.5.3.jar
-
Classpath should contain the GRIDGAIN_HOME/config/aop/aspectj folder.
Attach @GridifyTest Annotation
The only difference from standard JUnit3 suites is that we need to attach @GridifyTest doc annotation to standard static suite() method as follows:
1 2 3 4 | @GridifyTest
public static TestSuite suite() {
...
}
|
You can pass configuration parameters into GridifyTest annotation. Refer to @GridifyTest doc documentation for more information.
Running Tests Sequentially
Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence.
1 2 3 4 5 6 7 8 | // Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add tests A and B.
suite.addTest(nested);
|
Full Source Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | /**
* Regular JUnit3 suite. Note that because of {@link GridifyTest} annotation,
* all tests will execute in parallel on the grid.
* <p>
* Note that since {@link TestA} and {@link TestB} are added to this
* suite from within another nested suite and not directly, they will
* always execute sequentially, however still in parallel with other
* tests.
*/
public final class GridifyJunit3ExampleTestSuite {
/**
* Standard JUnit3 static <tt>suite()</tt> method. Note we attach {@link GridifyTest}
* annotation to it, so it will be grid-enabled.
*
* @return JUnit3 suite.
*/
@GridifyTest
public static TestSuite suite() {
TestSuite suite = new TestSuite("Example Test Suite");
// Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");
nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);
// Add tests A and B.
suite.addTest(nested);
// Add other tests.
suite.addTestSuite(TestC.class);
suite.addTestSuite(TestD.class);
return suite;
}
}
|
16.2.2. Distributed JUnit 4
Distributed JUnit support has been added starting GridGain 1.6.0 release.
You can distribute your JUnit4 test suites in 2 different ways. One way is to use GridJunit4Suite doc directly instead of the usual JUnit4 Suite class. This is perhaps the easiest and most straight forward way to grid-enable your JUnit4 test suites.
Another way is with AOP by attaching @GridifyTestdoc annotation to the same class you attach @RunWith(Suite.class) annotation. The advantage of this approach is that you can provide test configuration parameters, such as timeout or custom test router right as annotation parameter in code, without having to pass extra VM arguments.
GridJunit4Suitedoc is standard JUnit4 test suite runner for distributing JUnit4 tests. Simply add tests to this suite runner just like you would for regular JUnit4 suites, and these tests will be executed in parallel on the grid. Note that if there are no other grid nodes, this suite runner will still ensure parallel test execution within single VM.
Below is an example of distributed JUnit4 test suite:
1 2 3 4 5 6 7 8 9 10 | @RunWith(GridJunit4Suite.class)
@SuiteClasses({
TestA.class, // TestA will run in parallel on the grid.
TestB.class, // TestB will run in parallel on the grid.
TestC.class, // TestC will run in parallel on the grid.
TestD.class // TestD will run in parallel on the grid.
})
public class GridJunit4ExampleSuite {
// No-op.
}
|
If you have four tests A, B, C, and D, and if you need to run A and B sequentially, then you should create a nested test suite with test A and B as follows:
1 2 3 4 5 6 7 8 9 | @RunWith(GridJunit4Suite.class)
@SuiteClasses({
GridJunit4ExampleNestedSuite.class, // Nested suite that will execute tests A and B added to it sequentially.
TestC.class, // TestC will run in parallel on the grid.
TestD.class // TestD will run in parallel on the grid.
})
public class GridJunit4ExampleSuite {
// No-op.
}
|
1 2 3 4 5 6 7 | @SuiteClasses({
TestA.class,
TestB.class
})
public class GridJunit4ExampleNestedSuite {
// No-op.
}
|
Note that you can also grid-enable existing JUnit4 tests using @GridifyTestdoc annotation which you can attach to the same class you attach @RunWith annotation to.
GridJunit4LocalSuite
Some tests can only be executed locally mostly due to some environment issues. However they still can benefit from parallel execution with other tests. GridGain supports it via GridJunit4LocalSuite doc suites that can be nested within GridJunit4Suite doc test suites.
To use local test suite within distributed test suite, simply add it to distributed test suite as follows:
1 2 3 4 5 6 7 8 9 | @RunWith(GridJunit4Suite.class)
@SuiteClasses({
TestA.class,
TestB.class,
GridJunit4ExampleNestedLocalSuite.class, // Local suite that will execute its test C locally.
})
public class GridJunit4ExampleSuite {
// No-op.
}
|
1 2 3 4 5 6 7 8 | @RunWith(GridJunit4LocalSuite.class) // Specify local suite to run tests.
@SuiteClasses({
TestC.class,
TestD.class
})
public class GridJunit4ExampleNestedLocalSuite {
// No-op.
}
|
Logging
When running distributed JUnit, all the logging that is done to System.out or System.err is preserved. GridGain will accumulate all logging that is done on remote nodes, send them back to originating node and associate all log statements with their corresponding tests. This way, for example, if you are running tests from and IDEA or Eclipse (or any other IDE) you would still see the logs as if it was a local run. However, since remote nodes keep all log statements done within a single individual test case in memory, you must make sure that enough memory is allocated on every node and that individual test cases do not spit out GigaBytes of log statements. Also note, that logs will be sent back to originating node upon completion of every test, so don’t be alarmed if you don’t see any log statements for a while and then all of them appear at once.
GridGain achieves such log transparency via reassigning System.out or System.err to internal PrintStream implementation. However, when using Log4J (or any other logging framework) within your tests you must make sure that it is configured with ConsoleAppender and that ConsoleAppender.setFollow(boolean) attribute is set to true. Logging to files is not supported yet and is planned for future releases.
Test Nesting
GridJunit4Suitedoc instances can be nested within each other as deep as needed. However all nested distributed test suites will be treated just like regular JUnit test suites and not as distributed test suites. This approach becomes convenient when you have several distributed test suites that you would like to be able to execute separately in distributed fashion, but at the same time you would like to be able to execute them as a part of larger distributed suites.
To enable JUnit4 tests using @GridifyTestdoc annotation, you need to attach this annotation to the same class that has Suite annotation (only Suite runners can be grid-enabled in JUnit4).
1 2 3 4 5 6 7 8 9 10 | @RunWith(Suite.class)
@SuiteClasses({
GridJunit4ExampleNestedSuite.class, // Nested suite that will execute tests A and B added to it sequentially.
TestC.class, // Test C will run in parallel with other tests.
TestD.class // TestD will run in parallel with other tests.
})
@GridifyTest // Run this suite on the grid.
public class GridifyJunit4ExampleSuite {
// No-op.
}
|
1 2 3 4 5 6 7 8 | @RunWith(Suite.class)
@SuiteClasses({
TestA.class,
TestB.class
})
public class GridJunit4ExampleNestedSuite {
// No-op.
}
|
To run distributed JUnit tests you need to start other instances of GridGain. You can do so by running GRIDGAIN_HOME/bin/ggjunit.{sh|bat} script, which will start default configuration. If configuration other than default is required, then use regular GRIDGAIN_HOME/bin/ggstart.{sh|bat} script and pass your own Spring configuration file as a parameter to the script.
You can use the following configuration parameters to configure distributed test suite locally. These parameters are set via @GridifyTestdoc annotation. Note that GridGain will check these parameters even if AOP is not enabled. Also note that many parameters can be overridden by setting corresponding VM parameters defined in GridTestVmParametersdoc at VM startup.
| Configuration Method | Default Value | Description |
|---|---|---|
@GridifyTest.disabled()doc |
false |
If true then GridGain will be turned off and suite will run locally. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_DISABLEDdoc JVM parameter to true. This parameter comes handy when you would like to turn off GridGain without changing the actual code. |
@GridifyTest.configPath()doc |
config/junit/junit-spring.xml |
Optional path to GridGain Spring configuration file for running JUnit tests. This property can be overridden by setting GridTestVmParameters.GRIDGAIN_CONFIGdoc VM parameter. Note that the value can be either absolute value or relative to GRIDGAIN_HOME installation folder. |
@GridifyTest.routerClass()doc |
GridTestRouterAdapter doc class. |
Optional router class that implements GridTestRouter doc interface. If not provided, then tests will be routed in round-robin fashion using default GridTestRouterAdapterdoc. The value of this parameter can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_ROUTER doc VM parameter to the name of your own custom router class. |
GRIDGAIN_ROUTER_PREFER_REMOTEdoc |
false |
This value can only be set as VM parameter. Set it to true, e.g. -DGRIDGAIN_ROUTER_PREFER_REMOTE=true, if you would like test router to not route tests to local node if there are remote nodes present. Note that this property works only with default test router. |
@GridifyTest.timeout()doc |
0 which means that tests will never timeout. |
Maximum timeout value in milliseconds after which test suite will return without waiting for the remaining tests to complete. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_TIMEOUT doc JVM parameter to the timeout value for the tests. |
Test Scheduling
With GridGain you can configure how many tests you can run in parallel by specifying parallelJobsNumber configuration parameter on GridCollisionSpidoc. Simply uncomment the following section in GRIDGAIN_HOME/config/junit/junit-spring.xml file:
1 2 3 4 5 | <property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.fifoqueue.GridFifoQueueCollisionSpi">
<property name="parallelJobsNumber" value="1"/>
</bean>
</property>
|
The XML configuration above will guarantee that only 1 test can run at a time on local or remote nodes. You can ensure this way that although your tests run in parallel on different nodes, within a single node only one test can be running and all other ones are waiting.
To start a remote node for JUnit tests, open the terminal window on Linux/Mac OS X or Command Prompt on Windows, change directory to GRIDGAIN_HOME/bin and run the ggstart.{sh|bat} script. However, distributed JUnits have to use GridTestExecutorServicedoc which is pre-configured in GRIDGAIN_HOME/config/junit/junit-spring.xml Spring configuration file. You need to specify a path to this file to the gridgain startup script as follows:
gridgain.bat config/junit/junit-spring.xml
or starting from GridGain 1.6.1, simply execute ggjunit.{sh|bat} script:
gridgain-junit.bat
It takes 2-3 seconds for grid node to start and if everything worked fine you should see starting log ending with successful start acknowledgment.
This example will demonstrate how GridGain can distribute your long running JUnit4 tests or test suites across grid and hence dramatically speeding up overall execution of all tests.
To try this example you will need to open GridJunit3ExampleTestSuite.java in IDEA, Eclipse or any other IDE and run this JUnit3 suite using standard IDE JUnit integration. You will observe how execution of the tests is offloaded to remote nodes and then the results are seen in the IDE just as if it was a local run.
To run this example you need to start one or more additional grid nodes. For simplicity, you can start these nodes on the same box on which you are running the example.
Specify GridJunit4Suite Runner
The only difference from standard JUnit4 suites is that instead of specifying Suite runner, we must specify GridJunit4Suitedoc runner as follows.
1 | @RunWith(GridJunit4Suite.class)
|
Running Tests Sequentially
Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence.
1 2 3 4 5 6 7 8 | @RunWith(Suite.class)
@SuiteClasses({
TestA.class,
TestB.class
})
public class GridJunit4ExampleNestedSuite {
// No-op.
}
|
Running Tests Locally
Certain tests must run locally no matter what, often due to some environmental issues. Yet these tests can benefit from parallel execution with other tests. GridGain supports it via GridJunit4LocalSuitedoc suite runner.
1 2 3 4 5 | @RunWith(GridJunit4LocalSuite.class) // Specify local suite to run tests.
@SuiteClasses(TestC.class)
public class GridJunit4ExampleNestedLocalSuite {
// No-op.
}
|
Full Source Code
1 2 3 4 5 6 7 8 9 10 11 12 | @RunWith(GridJunit4Suite.class)
@SuiteClasses({
// Nested suite that will execute tests A and B added to it sequentially.
GridJunit4ExampleNestedSuite.class,
// Local suite that will execute its test C locally.
GridJunit4ExampleNestedLocalSuite.class,
// TestD will run in parallel with (A and B) and C tests.
TestD.class
})
public class GridJunit4ExampleSuite {
// No-op.
}
|
This example will demonstrate how GridGain can distribute your long running JUnit4 tests or test suites across grid using @GridifyTestdoc annotation.
Configuration
In order to enable @GridifyTestdoc you must enable either AspectJ or JBoss AOP.
JBoss AOP
Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:
-
The following JVM configuration must be present:
-
-javaagent:[path to jboss-aop-jdk50-4.x.x.jar]
-
-Djboss.aop.class.path=[path to gridgain.jar]
-
-Djboss.aop.exclude=org,com -Djboss.aop.include=org.gridgain.examples
-
-
The following JARs should be in a classpath:
-
javassist-4.x.x.jar
-
jboss-aop-jdk50-4.x.x.jar
-
jboss-aspect-library-jdk50-4.x.x.jar
-
jboss-common-4.x.x.jar
-
trove-1.0.x.jar
-
AspectJ AOP
The following configuration needs to be applied to enable AspectJ byte code weaving:
-
JVM configuration should include: -javaagent:GRIDGAIN_HOME/libs/aspectjweaver-1.5.3.jar
-
Classpath should contain the GRIDGAIN_HOME/config/aop/aspectj folder.
Attach @GridifyTest Annotation
The only difference from standard JUnit4 suites is that we need to attach @GridifyTest doc annotation to the same class that has @RunWith(Suite.class) annotation.
Running Tests Sequentially
Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence:
1 2 3 4 5 6 7 8 | @RunWith(Suite.class)
@SuiteClasses({
TestA.class,
TestB.class
})
public class GridifyJunit4ExampleNestedSuite {
// No-op.
}
|
Full Source Code
1 2 3 4 5 6 7 8 9 10 11 12 13 | @RunWith(Suite.class)
@SuiteClasses({
// Nested suite that will execute tests A and B added to it sequentially.
GridJunit4ExampleNestedSuite.class,
// Test C will run in parallel with other tests.
TestC.class,
// TestD will run in parallel with other tests.
TestD.class
})
@GridifyTest // Run this suite on the grid.
public class GridifyJunit4ExampleSuite {
// No-op.
}
|
16.3. Bamboo Integration
When plugging JUnit3 tests into Bamboo continuous build, Bamboo may start displaying wrong test count, even though all tests do execute. This happens because Bamboo for some reason cannot properly process test class names for classes augmented by JavaAsssist which it finds in JUnit XML file from Ant.
To fix it, add the following Ant target to your Ant script after all JUnit targets. This target will go through all JUnit test XML results and remove JavaAssist suffices from test class names.
1 2 3 4 5 6 7 | <replaceregexp byline="true">
<regexp pattern='(<testcase.*classname=\".*)(_\$\$_javassist_\d+)'/>
<substitution expression='\1'/>
<fileset dir="/foo/bar/test-results">
<include name="TEST-*.xml"/>
</fileset>
</replaceregexp>
|
16.4. Log4j Integration
Currently only console appenders are supported. So if you need an output file to be generated on say Bamboo we recommend you to use following Ant feature. Ant JUnit test tag has a formatter property and allows you to redirect test output to the file or files. Use it like below:
1 2 3 4 5 | <junit>
<test todir="your_path">
<formatter type="plain"/>
</test>
</junit>
|
This example will save entire test output to the your_path directory.
See http://ant.apache.org/manual/Tasks/junit.html for additional information.
17. Concurrency Unification and Virtual JVM
TODO
18. GridGain CloudBoot - Dynamic VM Image Update
CloudBoot allows to optionally download and then re-download GridGain installation folder from two different URLs (using s3, ftp of file protocols) and start the node from the constructed this way installation directory.
|
|
CloudBoot feature is available only in Enterprise Edition. |
18.1. Usage
To run CloudBoot you have to use cloudboot.{sh|bat} start script located in cloudboot/bin folder. It accepts following arguments:
| Short form | Long form | Description | Required | Default |
|---|---|---|---|---|
-m |
--download-uri |
URL to download GridGain from |
Yes |
|
-o |
--override-uri |
URL to override GridGain from |
No |
None |
-g |
--gridgain-home |
GridGain home folder where to put |
No |
Taken from GRIDGAIN_HOME environment variable |
-c |
--start-script |
Node start script path |
No |
bin/ggstart.{sh|bat} |
-p |
--start-params |
Node start script parameters |
No |
None |
-a |
--s3-access-key |
S3 access key ID |
Yes, if S3 protocol is used |
|
-s |
--s3-secret-key |
S3 secret access key |
Yes, if S3 protocol is used |
|
-u |
--ftp-user FTP |
FTP user name |
Yes, if FTP server needs authentication |
|
-w |
--ftp-passwd |
FTP password |
Yes, if FTP server needs authentication |
|
-f |
--no-excludes |
Download all files and folders (dy default javadoc and examples are excluded) |
No |
false |
18.2. Examples
Below are examples of using cloudboot.{sh|bat} script with different protocols.
cloudboot -m s3:gg-3.0-bucket -a <s3_key_id> -s <s3_secret_key> -g /custom/gg-home
Downloads GridGain from S3 bucket to custom location and starts node with default configuration.
cloudboot -m ftp://ftp.org/gg-3.0 -l <ftp_user> -u <ftp_passwd> -c /custom/gg.sh
Downloads GridGain from FTP server and starts node using custom start script.
cloudboot -m file:///path/gg-3.0 -p custom/gg.xml -f
Copies all GridGain files and folders including javadoc and examples and starts node with custom configuration.
19. Management and Monitoring
TODO
19.1. JMX Instrumentation
TODO
19.2. GridGain Visor - Scriptable Monitoring
19.2.1. Overview
Visor provides scriptable monitoring capabilities for GridGain. What emacs does for code editing - Visor does for GridGain monitoring.
Visor is a library that can be used as it is but it is most often used within Scala REPL as interactive monitoring environment. This is preferable way to use Visor for basic monitoring.
19.2.2. Usage
GridGain ships with GRIDGAIN_HOME/bin/ggvisor.{sh|bat} script that starts Scala REPL with automatically loaded Visor.
Another alternative is to load Visor manually. GridGain ships with GRIDGAIN_HOME/bin/ggscala.{sh|bat} script that starts Scala REPL (that should be on PATH) with GridGain on classpath. You can use this script to conveniently start Scala REPL with GridGain. Note that currently GridGain supports Scala 2.8 and higher only.
Once started you can pre-load Visor via the following command (assuming you are in GridGain installation folder):
:load bin/visor.scala
Script GRIDGAIN_HOME/bin/visor.scala contains Scala code that pre-loads Visor. You can modify this script freely to pre-load any other necessary code. By default, this script pre-load all necessary imports and runs visor status command. Note also that visor starts as a daemon node and therefore not visible in a normal topology.
Just type to get help and get started:
visor ?
19.2.3. Commands
Following commands are available in Visor:
| Command | Alias | Description |
|---|---|---|
ack |
Acks arguments on all remote nodes. |
|
alert |
Email alerts for user-defined events. |
|
cache |
Prints cache statistics. |
|
close |
Disconnects visor from the grid. |
|
config |
Prints node configuration. |
|
dash |
Opens Visor UI dashboard. |
|
deploy |
Copies file or directory to remote host. |
|
disco |
Prints topology change log. |
|
events |
Print events from a node. |
|
gc |
Runs GC on remote nodes. |
|
help |
? |
Prints visor help. |
kill |
Kills or restarts node. |
|
license |
Shows information about licenses and updates them. |
|
log |
Starts or stops grid-wide events logging. |
|
mclear |
Clears visor memory variables. |
|
mget |
Gets visor memory variable. |
|
mlist |
Prints visor memory variables. |
|
node |
Prints node statistics. |
|
open |
Connects visor to the grid. |
|
ping |
Pings node. |
|
start |
Starts or restarts nodes on remote hosts. |
|
status |
! |
Prints visor status. |
tasks |
Prints tasks execution statistics. |
|
top |
Prints current topology. |
|
vvm |
Opens VisualVM for nodes in topology. |
Type to get full information on cmd command:
visor ? "cmd"
open
open command connects visor to the grid.
|
|
P2P class loading should be enabled on all nodes. |
visor open "{-cpath=<path>|-curl=<url>} {-g=<gridName>} {-dl}"
visor open "{-d} {-g=<gridName>} {-dl}"
visor open "{-e} {-g=<gridName>} {-dl}"
visor open
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
cpath |
Spring configuration path. Either -cpath or -curl can be specified - but not both. |
Yes |
Asked interactively |
curl |
Spring configuration URL. Either -cpath or -curl can be specified - but not both. |
Yes |
Asked interactively |
g |
Grid name. |
Yes |
Default grid |
d |
Flag forces the command to connect to the default grid without interactive mode. |
Yes |
|
e |
Flag forces the command to connect to the existing grid without interactive mode. If there is no existing grid command will fail. |
Yes |
|
dl |
Flag disables remote log collection. |
Yes |
-
Prompts user to select XML Spring configuration file in interactive mode:
visor open
-
Connects visor using default XML configuration:
visor open "-d"
-
Connects visor to mygrid grid using default configuration:
visor open "-g=mygrid"
-
Connects visor to mygrid grid using configuration from provided Spring file:
visor open "-cpath=/gg/config/mycfg.xml -g=mygrid"
close
close command disconnects visor from the grid.
visor close
No arguments can be provided.
Disconnects visor from the grid:
visor close
status
status command prints visor status.
visor status {"-q"}
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
q |
Quite output without ASCII logo. |
Yes |
-
Prints visor status:
visor !
-
Prints visor status in quiet mode:
visor status "-q"
-
Disconnected status:
-
Connected status:
ack
ack command acks arguments on all remote nodes.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.ack.VisorAckCommand._
Note that VisorAckCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor ack {"s"}
visor ack ("s", f)
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
s |
String to print on each remote node. |
Yes |
Local node ID. |
f |
Scala predicate on ScalarRichNodePimp filtering nodes in the topology. |
Yes |
-
Prints Howdy! on all nodes in the topology:
visor ack "Howdy!"
-
Prints Howdy! on all nodes satisfying this predicate:
visor ack("Howdy!", _.id8.startsWith("123")) -
Prints local node ID on all nodes in the topology:
visor ack
alert
alert command generates email alerts for user-defined events. Node events and grid-wide events are defined via mnemonics.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.alert.VisorAlertCommand._
Note that VisorAlertCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor alert
visor alert "-u {-id=<alert-id>|-a}"
visor alert "-r {-t=<sec>} -c1=e1<num> -c2=e2<num> ... -ck=ek<num>"
Following arguments can be provided:
| Argument | Description | Optional | Default | ||||||
|---|---|---|---|---|---|---|---|---|---|
u |
Unregisters alert(s). Either -a flag or -id parameter is required. Note that only one of the -u or -r is allowed. If neither -u or -r provided - all alerts will be printed. |
Yes |
|||||||
a |
When provided with -u - all alerts will be unregistered. |
Yes |
|||||||
id |
When provided with -u - alert with matching ID will be unregistered. |
Yes |
|||||||
r |
Register new alert with mnemonic predicate(s). Note that only one of the -u or -r is allowed. If neither -u or -r provided - all alerts will be printed. |
Yes |
|||||||
t |
Defines notification frequency in seconds. This parameter can only appear with -r. |
Yes |
900 (15 minutes) |
||||||
ck |
Defines a mnemonic for the metric that will be measured. Grid-wide metrics (not node specific):
Per-node current metrics:
Comparison part of the mnemonic predicate:
|
Yes |
-
Prints all currently registered alerts:
visor alert
Output:
-
Unregisters all currently registered alerts:
visor alert "-u -a"
-
Unregisters alert with provided ID:
visor alert "-u -id=12345678"
-
Notifies every 10 min if grid has >= 4 CPUs and > 50% CPU load:
visor alert "-r -t=600 -cc=gte4 -cl=gt50"
cache
cache command prints statistics about caches from specified node on the entire grid. Output sorting can be specified in arguments.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.cache.VisorCacheCommand._
Note that VisorCacheCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
Output abbreviations:
# |
Number of nodes. |
H/h |
Number of cache hits. |
M/m |
Number of cache misses. |
R/r |
Number of cache reads. |
W/w |
Number of cache writes. |
visor cache
visor cache "-i {-n=<name>}"
visor cache "{-n=<name>} {-id=<node-id>|id8=<node-id8>} {-s=lr|lw|hi|mi|re|wr} {-a} {-r}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
id |
Full ID of the node to get cache statistics from. Either -id8+or +-id can be specified. If neither is specified statistics will be gathered from all nodes. |
Yes |
|
id8 |
ID8 of the node to get cache statistics from. Either -id8+or +-id can be specified. If neither is specified statistics will be gathered from all nodes. |
Yes |
|
n |
Name of the cache. By default - statistics for all caches will be printed. |
Yes |
|
s |
Defines sorting type. Sorted by:
|
Yes |
lr |
i |
Interactive mode. User can interactively select node for cache statistics. |
Yes |
|
r |
Defines if sorting should be reversed. Can be specified only with -s argument. |
Yes |
Sorting is not reversed |
a |
Prints details statistics about each cache. |
Yes |
Only aggregated summary is printed. |
-
Prints summary statistics about caches from node with specified ID8 sorted by number of hits in reverse order:
visor cache "-id8=12345678 -s=hi -r"
-
Prints cache statistics for interactively selected node:
visor cache "-i"
-
Prints detailed statistics about all caches sorted by number of hits in reverse order:
visor cache "-s=hi -r -a"
-
Prints summary statistics about all caches:
visor cache
Output:
-
Prints detailed statistics about specified cache:
visor cache "-n=partitioned -a"
Output:
config
config command prints node configuration.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.config.VisorConfigurationCommand._
Note that VisorConfigurationCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor config
visor config "{-id=<node-id>|id8=<node-id8>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
id |
Full node ID. Either -id8+or +-id can be specified. If neither is specified - command starts in interactive mode. |
Yes |
|
id8 |
Node ID8. Either -id8+or +-id can be specified. If neither is specified - command starts in interactive mode. |
Yes |
-
Prints configuration for node with specified ID8:
visor config "-id8=12345678"
-
Starts command in interactive mode:
visor config
dash
dash command opens UI Visor dashboard.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.dash.VisorDashboardCommand._
Note that VisorDashboardCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor dash
No arguments can be provided.
-
Opens UI Visor dashboard:
visor dash
deploy
deploy command copies file or directory to remote host. Relies on SFTP protocol.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.deploy.VisorDeployCommand._
Note that VisorDeployCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor deploy "-h={<username>{:<password>}@}<host>{:<port>} {-u=<username>}
{-p=<password>} {-k=<path>} -s=<path> {-d<path>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
h |
Host specification. <host> can be a hostname, IP or range of IPs. Example of range is 192.168.1.100~150, which means all IPs from 192.168.1.100 to 192.168.1.150 inclusively. Default port number is 22. This option can be provided multiple times. |
No |
|
u |
Default username. Used if specification doesn’t contain username. If default is not provided as well, current local username will be used. |
Yes |
|
p |
Default password. Used if specification doesn’t contain password. If default is not provided as well, it will be asked interactively. |
Yes |
|
k |
Path to private key file. If provided, it will be used for all specifications that doesn’t contain password. |
Yes |
|
s |
Source path. |
No |
|
d |
Destination path (relative to GRIDGAIN_HOME). |
Yes |
Root of GRIDGAIN_HOME |
-
Copies file or directory to remote host (password authentication):
visor deploy "-h=uname:passwd@host -s=/local/path -d=remote/path"
-
Copies file or directory to remote host (private key authentication):
visor deploy "-h=uname@host -k=ssh-key.pem -s=/local/path -d=remote/path"
disco
disco command prints topology change log as seen from the oldest node. Timeframe for querying events can be specified in arguments.
|
|
This command depends on GridGain events. GridGain events can be individually enabled and disabled and disabled events can affect the results produced by this command. Note also that configuration of Event Storage SPI that is responsible for temporary storage of generated events on each node can also affect the functionality of this command. By default - all events are enabled and GridGain stores last 10,000 local events on each node. Both of these defaults can be changed in configuration. |
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.disco.VisorDiscoveryCommand._
Note that VisorDiscoveryCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor disco
visor disco "{-t=<num>s|m|h|d} {-r} {-c=<n>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
t |
Defines timeframe for querying events:
|
Yes |
All events |
r |
Defines whether sorting should be reversed. |
Yes |
Sorting is not reversed |
c |
Defines the maximum events count that can be shown. |
Yes |
All events |
-
Prints all discovery events sorted chronologically (oldest first):
visor disco
Output:
-
Prints all discovery events sorted chronologically in reversed order (newest first):
visor disco "-r"
Output:
-
Prints discovery events fired during last minute sorted chronologically:
visor disco "-t=1m"
Output:
events
events command prints events from a node.
|
|
This command depends on GridGain events. GridGain events can be individually enabled and disabled and disabled events can affect the results produced by this command. Note also that configuration of Event Storage SPI that is responsible for temporary storage of generated events on each node can also affect the functionality of this command. By default - all events are enabled and GridGain stores last 10,000 local events on each node. Both of these defaults can be changed in configuration. |
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.events.VisorEventsCommand._
Note that VisorEventsCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor events
visor events "{-id=<node-id>|-id8=<node-id8>} {-e=<ch,cp,de,di,jo,ta,cl,ca,sw>}
{-t=<num>s|m|h|d} {-s=e|t} {-r} {-c=<n>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
id |
Full node ID. Either -id or -id8 can be specified. If called without the arguments - starts in interactive mode. |
Yes |
|
id8 |
Node ID8. Either -id or -id8 can be specified. If called without the arguments - starts in interactive mode. |
Yes |
|
e |
Comma separated list of event types that should be queried:
|
Yes |
All events |
t |
Defines timeframe for querying events:
|
Yes |
All events |
s |
Defines sorting of queried events:
Only one =e or =t can be specified. |
Yes |
|
r |
Defines if sorting should be reversed. Can be specified only with -s argument. |
Yes |
Sorting is not reversed |
c |
Defines the maximum events count that can be shown. Values in summary tables are calculated over the whole list of events. |
Yes |
All events |
-
Queries all events from specified node:
visor events "-id8=@n0"
Output:
-
Queries discovery events from specified node:
visor events "-id8=@n0 -e=di"
Output:
-
Starts command in interactive mode:
visor events
gc
gc command runs garbage collector on remote nodes.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.gc.VisorGcCommand._
Note that VisorGcCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor gc
visor gc "{-id8=<node-id8>|-id=<node-id>} {-c}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
id |
Full node ID. Either -id or -id8 can be specified. |
Yes |
|
id8 |
Node ID8. Either -id or -id8 can be specified. |
Yes |
|
c |
Run DGC procedure on all caches. |
Yes |
Don’t run DGC |
-
Runs garbage collector on all nodes in topology:
visor gc
-
Runs garbage collector on specified node:
visor gc "-id8=12345678"
-
Runs garbage collector and DGC procedure on all caches:
visor gc "-id8=12345678 -c"
kill
kill command kills or restarts node.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.kill.VisorKillCommand._
Note that VisorKillCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor kill
visor kill "-in|-ih"
visor kill "{-r|-k} {-id8=<node-id8>|-id=<node-id>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
in |
Run command in interactive mode with ability to choose a node to kill or restart. Note that either -in or -ih can be specified. This mode is used by default. |
Yes |
|
ih |
Run command in interactive mode with ability to choose a host where to kill or restart nodes. Note that either -in or -ih can be specified. |
Yes |
|
r |
Restart node mode. Note that either -r or -k can be specified. If none provided - command starts in interactive mode. |
Yes |
|
k |
Kill (stop) node mode. Note that either -r or -k can be specified. If none provided - command starts in interactive mode. |
Yes |
|
id8 |
ID8 of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode. |
Yes |
|
id |
ID of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode. |
Yes |
-
Starts command in interactive mode:
visor kill
-
Restarts node with specified ID8:
visor kill "-id8=12345678 -r"
-
Kills (stops) all nodes:
visor kill "-k"
license
license command shows information about all licenses that are used on the grid. Also can be used to update one of the licenses.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.license.VisorLicenseCommand._
Note that VisorLicenseCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor license visor license "-f=<path> -id=<license-id>"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
f |
Path to new license XML file. |
Yes |
|
id |
ID of the license will be updated. |
Yes |
-
Shows all licenses that are used on the grid:
visor license
Output:
-
Copies new license file to all nodes that use license with provided ID:
visor license "-f=/path/to/new/license.xml -id=fbdea781-90e6-4d1b-b8b3-5b8c14aa2df7"
log
log command starts or stops grid-wide events logging of discovery and failure grid-wide events. Logging starts by default when Visor starts.
Events are logged to a file. If path is not provided, it will log into GRIDGAIN_HOME/work/visor/visor-log.
File is always opened in append mode. If file doesn’t exist, it will be created.
It is often convenient to tail -f the log file in a separate console window.
Log command prints periodic topology snapshots in the following format:
H/N/C |1 |1 |4 |=^========..........| where: H - Hosts N - Nodes C - CPUs = - 5%-based marker of average CPU load across the topology ^ - 5%-based marker of average heap memory used across the topology
visor log
visor log "-l {-f=<path>} {-p=<num>} {-t=<num>}"
visor log "-s"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
l |
Starts logging. If logging is already started - it’s no-op. |
Yes |
|
f |
Provides path to the file. Path can be absolute or relative to GRIDGAIN_HOME. |
Yes |
GRIDGAIN_HOME/work/visor/visor-log |
p |
Provides period of quering events (in seconds). |
Yes |
10 |
t |
Provides period of logging topology snapshot (in seconds). |
Yes |
20 |
s |
Stops logging. If logging is already stopped - it’s no-op. |
Yes |
-
Prints log status:
visor log
-
Starts logging to file located at /home/user/visor-log:
visor log "-l -f=/home/user/visor-log"
-
Starts logging to file located at GRIDGAIN_HOME/log/visor-log:
visor log "-l -f=log/visor-log"
-
Starts logging with querying events period of 20 seconds:
visor log "-l -p=20"
-
Starts logging with topology snapshot logging period of 30 seconds:
visor log "-l -t=30"
-
Stops logging:
visor log "-s"
mclear
mclear command clears visor memory variables.
visor mclear visor mclear "<name>|-ev|-al|-ca|-no|-tn|-ex"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
<name> |
Variable name to clear. Note that name doesn’t include @ symbol used to reference variable. |
Yes |
|
ev |
Clears all event variables. |
Yes |
|
al |
Clears all alert variables. |
Yes |
|
ca |
Clears all cache variables. |
Yes |
|
no |
Clears all node variables. |
Yes |
|
tn |
Clears all task name variables. |
Yes |
|
ex |
Clears all task execution variables. |
Yes |
-
Clears all visor variables:
visor mclear
-
Clears all visor cache variables:
visor mclear "-ca"
-
Clears n2 visor variable:
visor mclear "n2"
mget
mget command gets visor memory variable. Variable can be referenced with @ prefix.
visor mget "n"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
<name> |
Variable name. |
Yes |
-
Gets visor variable var:
visor mget "var"
-
Gets visor variable whose name is referenced by variable v:
visor mget "@v"
mlist
mlist command prints visor memory variables.
visor mlist {"arg"}
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
arg |
String that contains start characters of variable names. |
Yes |
-
Prints out all visor memory variables:
visor mlist
Output:
-
Lists variables that start with n from visor memory:
visor mlist "n"
Output:
node
node command prints node statistics.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.node.VisorNodeCommand._
Note that VisorNodeCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor node "{id8=<node-id8>|id=<node-id>} {-a}"
visor node
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
id8 |
ID8 of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode. |
Yes |
|
id |
ID of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode. |
Yes |
|
a |
Print extended information. |
Yes |
Only abbreviated statistics is printed. |
-
Starts command in interactive mode:
visor node
-
Prints statistics for specified node:
visor node "-id8=c0023e3e"
Output:
-
Prints full statistics for specified node:
visor node "-id8=c0023e3e -a"
Output:
ping
ping command pings node.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.ping.VisorPingCommand._
Note that VisorPingCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor ping {"id81 id82 ... id8k"}
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
id8k |
ID8 of the node to ping. |
Yes |
All nodes are pinged. |
-
Pings node with specified ID8:
visor ping "@n0"
Output:
-
Pings all nodes in the topology:
visor ping
Output:
start
start command starts one or more nodes on remote host(s). Uses SSH protocol to execute commands.
|
|
SSH remote execution requires that all environment properties be set globally on the remote node. Standard GridGain ggstart.{sh|bat} script needs both GRIDGAIN_HOME and JAVA_HOME environment variables set globally for SSH-based execution to work. On Linux - you can use /etc/environment file to set global environment variables at the login time. Mac OSX currently doesn’t support automatic setting of global variable and you need to provide custom start script in this case. On Windows use standard way to set environment properti |
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.start.VisorStartCommand._
Note that VisorStartCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor start "-f=<path> {-u=<username>} {-p=<password>} {-k=<path>} {-n=<num>}
{-s=<path>} {-c=<path>} {-m=<num>} {-r} {-l=<path>}"
visor start "-h={<username>{:<password>}@}<host>{:<port>}{#<num>}
{-u=<username>} {-p=<password>} {-k=<path>} {-n=<num>} {-s=<path>}
{-c=<path>} {-m=<num>} {-r} {-l=<path>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
f |
Path to file that contains topology specification. Each line of the file represents one host. Format is the following: {<uname>{:<passwd>}@}<host>{:<port>}{<number of nodes>}. Lines starting with will be ignored. <host> can be a hostname, IP or range of IPs. Example of range is 192.168.1.100~150, which means all IPs from 192.168.1.100 to 192.168.1.150 inclusively. Default port is 22. Default number of nodes is 1. |
Yes |
|
h |
Topology specification for one host. <host> can be a hostname, IP or range of IPs. Example of range is 192.168.1.100~150, which means all IPs from 192.168.1.100 to 192.168.1.150 inclusively. Default port number is 22. Default number of nodes is 1. This option can be provided multiple times. If used with -f, it will override specifications taken from file. |
Yes |
|
u |
Default username. Used if specification doesn’t contain username. |
Yes |
Current local username |
p |
Default password. Used if specification doesn’t contain password. |
Yes |
Asked interactively |
k |
Path to private key file. If provided, it will be used for all specifications that doesn’t contain password. |
Yes |
|
n |
Default number of starting nodes. Used if specification doesn’t contain number of nodes. |
Yes |
1 |
g |
GridGain home path. |
Yes |
Taken from GRIDGAIN_HOME environment variable. |
s |
Path to start script (relative to GridGain home). |
Yes |
Default is bin/ggstart.sh" for Unix or bin\ggstart.bat for Windows. |
c |
Path to configuration file. |
Yes |
Default GridGain configuration |
m |
Maximum number of nodes that can be started in parallel on one host. |
Yes |
5 |
r |
Indicates that existing nodes on the host will be restarted. |
Yes |
Don’t restart |
l |
Prefix for the log file path (relative to GridGain home). Each node will write log in separate file, appending node number to provided path. |
Yes |
work/log/gridgain.log |
-
Starts three nodes with default configuration (password authentication):
visor start "-h=uname:passwd@10.1.1.10#3"
-
Starts 3 nodes on 5 hosts with default configuration (key-based authentication):
visor start "-h=uname@192.168.1.100~104#3 -k=ssh-key.pem"
-
Reads hosts.txt file and starts nodes with provided configuration:
visor start "-f=hosts.txt -c=config/spring.xml"
tasks
tasks command prints statistics about tasks and executions.
|
|
This command depends on GridGain events. GridGain events can be individually enabled and disabled and disabled events can affect the results produced by this command. Note also that configuration of Event Storage SPI that is responsible for temporary storage of generated events on each node can also affect the functionality of this command. By default - all events are enabled and GridGain stores last 10,000 local events on each node. Both of these defaults can be changed in configuration. |
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.tasks.VisorTasksCommand._
Note that VisorTasksCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor tasks
visor tasks "-l {-t=<num>s|m|h|d} {-r}"
visor tasks "-s=<substring> {-t=<num>s|m|h|d} {-r}"
visor tasks "-g {-t=<num>s|m|h|d} {-r}"
visor tasks "-h {-t=<num>s|m|h|d} {-r}"
visor tasks "-n=<task-name> {-r}"
visor tasks "-e=<exec-id>"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
l |
List all tasks and executions. Executions sorted chronologically (see -r), and tasks alphabetically. This is a default mode when command is called without parameters. |
Yes |
|
s |
List all tasks and executions for a given task name substring. Executions sorted chronologically (see -r), and tasks alphabetically. |
Yes |
|
g |
List all tasks grouped by nodes for a given time period. Tasks sorted alphabetically. |
Yes |
|
h |
List all tasks grouped by hosts for a given time period. Tasks sorted alphabetically. |
Yes |
|
t |
Defines time frame:
|
Yes |
1 hour |
r |
Reverse sorting of executions. |
Yes |
Sorting is not reversed. |
n |
Defines task name to print aggregated statistic. |
Yes |
|
e |
Defines execution ID to print aggregated statistic. |
Yes |
-
Prints list of all tasks and executions for the last hour (default):
visor tasks
Output:
-
Prints list of tasks and executions that started during last 2 minutes:
visor tasks "-l -t=2m"
Output:
-
Prints list of all tasks and executions that have HelloWorld in task name.
visor tasks "-s=HelloWorld"
Output:
-
Prints list of tasks grouped by nodes:
visor tasks "-g"
Output:
-
Prints list of tasks that started during last 6 minutes grouped by nodes:
visor tasks "-g -t=6m"
Output:
-
Prints summary for task named GridTask:
visor tasks "-n=GridTask"
Output:
-
Traces task execution with ID taken from e1 memory variable:
visor tasks "-e=@e1"
Output:
top
top command prints current topology.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.top.VisorTopologyCommand._
Note that VisorTopologyCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor top "{-c1=e1<num> -c2=e2<num> ... -ck=ek<num>}
{-h=<host1> ... -h=<hostk>} {-a}"
Following arguments can be provided:
| Argument | Description | Optional | Default | ||||
|---|---|---|---|---|---|---|---|
ck |
Defines a mnemonic for node filter:
Comparison part of the mnemonic predicate:
|
Yes |
|||||
h |
This defines a host to show nodes from. Can be provided multiple times. |
Yes |
|||||
a |
Defines whether to show a separate table of nodes with detail per-node information. |
Yes |
-
Prints topology for all nodes with CPU load greater than 20%:
visor top "-cl=gt20"
Output:
-
Prints full information for all nodes with CPU load less than 20%:
visor top "-cl=lt20 -a"
Output:
-
Prints topology for provided host:
visor top "-h=192.168.1.100"
Output:
-
Prints full topology:
visor top
Output:
vvm
vvm command opens VisualVM.
When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:
import org.gridgain.visor._ import commands.vvm.VisorVvmCommand._
Note that VisorVvmCommand object contains necessary implicit conversions so that this command would be available via visor keyword.
visor vvm "{-home=dir} {-id8=<node-id8>} {-id=<node-id>}"
Following arguments can be provided:
| Argument | Description | Optional | Default |
|---|---|---|---|
home |
VisualVM home directory. |
Yes |
PATH and JAVA_HOME are searched |
id8 |
ID8 of node. Either -id8 or -id can be specified. |
Yes |
|
id |
Full ID of node. Either -id8 or -id can be specified. |
Yes |
-
Opens VisualVM connected to JVM for node with specified ID8:
visor vvm "-id8=12345678"
-
Opens VisualVM connected to JVM for node with given full node ID:
visor vvm "-id=5B923966-85ED-4C90-A14C-96068470E94D"
-
Opens VisualVM installed in C:\VisualVM directory for specified node:
visor vvm "-home=C:\VisualVM -id8=12345678"
-
Opens VisualVM connected to all nodes:
visor vvm
20. Appendix A - SPIs
20.1. Discovery SPI
20.1.1. Overview
GridDiscoverySpidoc provides a mechanism in grid by which every node can discovery other nodes on the grid.
To discover remote nodes and get remote node attributes, the following public methods are available:
and others.
20.1.2. Built-in Implementations
GridMulticastDiscoverySpi
Discovery SPI implementation using IP-multicast.
The following configuration parameters can be used to configure GridMulticastDiscoverySpidoc:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setMulticastGroup(String) doc |
Multicast IP address. |
Yes |
228.1.2.4 |
setMulticastPort(int)doc |
Port number which multicast messages are sent to. |
Yes |
47200 |
setTcpPort(int)doc |
Local port number that is used by discovery SPI. |
Yes |
47300 |
setHeartbeatFrequency(long) doc |
Delay in milliseconds between heartbeat requests. SPI sends Multicast messages in configurable time interval to other nodes to notify them about its state. |
Yes |
3000 |
setMaxMissedHeartbeats(int) doc |
Number of heartbeat requests that could be missed before remote node is considered to be failed. |
Yes |
3 |
setLeaveAttempts(int)doc |
Number of attempts to notify another nodes that this one is leaving grid. Multiple leave requests are sent to increase the chance of successful delivery to every node, since IP Multicast protocol is unreliable. Note that on most networks loss of IP Multicast packets is generally negligible. |
Yes |
3 |
setLocalAddress(String) doc |
Local host IP address that discovery SPI uses. |
Yes |
Local host address. Preference will be given to none-loopback address if one can be detected. Otherwise, loopback address will be assigned. |
setTimeToLive(int)doc |
Multicast messages time-to-live in router hops. |
Yes |
8 |
setLocalPortRange(int)doc |
Local port range for TCP and Multicast ports (value must greater than or equal to 0). If provided local port (see GridMulticastDiscoverySpi.setMulticastPort(int) doc or GridMulticastDiscoverySpi.setTcpPort(int) doc is occupied, implementation will try to increment the port number for as long as it is less than initial value plus this range. If port range value is 0, then implementation will try bind only to the port provided by GridMulticastDiscoverySpi.setMulticastPort(int) doc or GridMulticastDiscoverySpi.setTcpPort(int) doc methods and fail if binding to these ports did not succeed. Local port range is very useful during development when more than one grid nodes need to run on the same physical machine. |
Yes |
10 |
|
|
IP-multicast IP-multicast should be enabled for this SPI to function properly. We advise you to
Google
search for Java IP-multicast troubleshooting on the internet as many IP-multicast intensive systems
have good configuration and optimization documentations. |
GridMulticastDiscoverySpidoc is used by default and should be explicitely configured only if some SPI configuration parameters need to be overriden. Examples below insert own multicast group value that differs from default 228.1.2.4.
From code:
1 2 3 4 5 6 7 8 9 10 11 12 | GridMulticastDiscoverySpi spi = new GridMulticastDiscoverySpi();
// Put another multicast group.
spi.setMulticastGroup("228.10.10.157");
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default discovery SPI.
cfg.setDiscoverySpi(spi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.multicast.GridMulticastDiscoverySpi">
<property name="multicastGroup" value="228.10.10.157"/>
</bean>
</property>
...
</bean>
|
GridTcpDiscoverySpi
|
|
Please note, that in release 3.0.5 and before this SPI was named as org.gridgain.grid.spi.discovery.tcplite.GridTcpLiteDiscoverySpi. |
This Discovery preserves order of nodes added, this means that if two nodes A and B are added to the topology in some order, then other nodes in the topology get NODE_JOINED event in the same order exactly. All nodes in topology are organized in a ring. Topology has coordinator node (this is an ordinary node, no extra-configuration required) that is responsible for issuing heartbeat messages, adding new nodes to topology and for cleaning IP finder (in case it is shared) and metrics store (if one is used). Coordinator may leave or fail, in this case one of the rest nodes will take the role.
The following configuration parameters can be used to configure GridTcpDiscoverySpi doc:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setIpFinder(GridTcpDiscoveryIpFinder) doc |
IP finder that is used to share info about nodes IP addresses. |
No |
|
setMetricsStore(GridTcpDiscoveryMetricsStore) doc |
When metrics store is provided metrics are not sent via heartbeat messages, they are stored in the store and are requested by nodes on demand. Each node updates its metrics in the store once a heartbeat period. Under certain conditions using of the metrics store may save network bandwidth. |
Yes |
|
setLocalAddress(String)doc |
Sets local host IP address that discovery SPI uses. |
Yes |
If not provided, by default a first found non-loopback address will be used. If there is no non-loopback address available, then java.net.InetAddress.getLocalHost() will be used. |
setLocalPort(int)doc |
Port the SPI listens to. |
Yes |
47500 |
setLocalPortRange(int)doc |
Local port range. Local node will try to bind on first available port starting from local port up until local port + local port range. |
Yes |
100 |
setHeartbeatFrequency(long)doc |
Delay in milliseconds between heartbeat issuing of heartbeat messages. SPI sends messages in configurable time interval to other nodes to notify them about its state. |
Yes |
2000 |
setMaxMissedHeartbeats(int)doc |
Number of heartbeat requests that could be missed before local node initiates status check. |
Yes |
1 |
setReconnectCount(int)doc |
Number of times node tries to (re)establish connection to another node. |
Yes |
2 |
setNetworkTimeout(long)doc |
Sets maximum network timeout in milliseconds to use for network operations. |
Yes |
5000 |
setSocketTimeout(long)doc |
Sets socket operations timeout. This timeout is used to limit connection time and write-to-socket time. |
Yes |
2000 |
setAckTimeout(long)doc |
Sets timeout for receiving acknowledgement for sent message. If acknowledgement is not received within this timeout, sending is considered as failed and SPI tries to repeat message sending. |
Yes |
2000 |
setJoinTimeout(long)doc |
Sets join timeout. If non-shared IP finder is used and node fails to connect to any address from IP finder, node keeps trying to join within this timeout. If all addresses are still unresponsive, exception is thrown and node startup fails. 0 means wait forever. |
Yes |
0 |
setThreadPriority(int)doc |
Thread priority for threads started by SPI. |
Yes |
7 |
setStoresCleanFrequency(int)doc |
IP finder and Metrics Store clean frequency in milliseconds. Coordinator will clean IP finder and metrics store once a period. |
Yes |
60000 |
setStatisticsPrintFrequency(int) doc |
Statistics print frequency in milliseconds. 0 indicates that no print is required. If value is greater than 0 and log is not quiet then stats are printed out with INFO level once a period. This may be very helpful for tracing topology problems. |
Yes |
0 |
setFastForwardFailureDetection(boolean) doc |
Sets fast forward failure detection flag. If this flag is set to true and connection to some node times out, then the host will be considered unreachable and all other nodes on the same host will be considered failed. If multiple nodes are launched on the same machine, setting this property to true increases failure detection speed in case network goes down on that host. |
Yes |
true |
Using of metrics store can increase network performance (especially in large topologies) and save network bandwidth since metrics are not sent via heartbeat messages, they are stored in the store and are requested by nodes on demand. Each node updates its metrics in the store once a heartbeat period.
Without metrics store heartbeat message may grow a bit too much to be quickly and efficiently transferred across all nodes in topology, so for better performance we recommend using metrics store.
Provided implementations can be used (for configuration details refer to Javadocs):
When you are going to launch significant amount of nodes (100 and more) in your grid, it is recommended to configure SPI using metrics store.
Please refer to the following table for SPI configuration:
| Property | Value |
|---|---|
networkTimeout |
30000 |
heartbeatFrequency |
10000 |
maxMissedHeartbeats |
3 |
GridTcpDiscoverySpidoc can be configured directly from code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | GridTcpDiscoverySpi spi = new GridTcpDiscoverySpi();
GridTcpDiscoveryVmIpFinder ipFinder = new GridTcpDiscoveryVmIpFinder();
ipFinder.setAddresses(Arrays.asList("127.0.0.1", "1.2.3.4:47520"));
// IP finder is required.
spi.setIpFinder(ipFinder);
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default discovery SPI.
cfg.setDiscoverySpi(spi);
// Start grid.
GridFactory.start(cfg);
|
from Spring configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.GridTcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>1.2.3.4:47500</value>
</list>
</property>
<property name="segmentCheckAddrs">
<list>
<bean class="java.net.InetAddress" factory-method="getByName">
<constructor-arg value="2.3.4.5"/>
</bean>
</list>
</property>
</bean>
</property>
</bean>
</property>
...
</bean>
|
or from Spring config file using JSON configuration:
1 2 3 4 5 6 7 8 9 10 11 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="json" value="{heartbeatFrequency: 5000; networkTimeout: 4000;
ipFinder: {addresses: ['1.2.3.4:47500'];
@class:'org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.GridTcpDiscoveryVmIpFinder'+"/>
</bean>
</property>
...
</bean>
|
Default Implementation
If no discovery SPI is provided in configuration by default GridMulticastDiscoverySpi is used.
20.1.3. Configuration
GridDiscoverySpi is provided in grid configuration passed into GridFactorydoc at startup. You can configure discovery SPI implementation as follows:
1 2 3 4 5 6 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Configure grid to use multicast discovery layer.
cfg.setDiscoverySpi(new GridMulticastDiscoverySpi());
GridFactory.start(cfg);
|
Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.
20.2. Communication SPI
20.2.1. Overview
GridCommunicationSpidoc enables communication between different nodes within grid. It provides basic plumbing to send and receive grid messages and is utilized for all distributed grid operations, such as task execution, monitoring data exchange, distributed even querying and others.
To send and receive messages from public API, the following public methods are available:
-
GridProjection.send(Object, GridPredicate…) doc
-
GridProjection.send(Collection, GridPredicate…) doc
-
GridProjection.listen(GridPredicate2…) doc
-
GridProjection.remoteListenAsync(Collection, GridPredicate2…) doc
-
GridProjection.remoteListenAsync(GridNode, GridPredicate2…) doc
-
GridProjection.remoteListenAsync(GridPredicate, GridPredicate2…) doc
|
|
Note that messages can be received asynchronously by registering listeners with listen(…) and remoteListenAsync(…) methods. GridGain also provides convenient actor-based adapter for them: GridListenActordoc. |
20.2.2. Built-in Implementations
Gridgain comes with following communication SPI’s supported out of the box.
GridTcpCommunicationSpi
GridTcpCommunicationSpidoc is default communication SPI which uses TCP/IP protocol to communicate with other nodes.
To enable communication with other nodes, this SPI adds GridTcpCommuncationSpi.ATTR_ADDR doc and GridTcpCommuncationSpi.ATTR_PORT doc local node attributes.
At startup, this SPI tries to start listening to local port specified by GridTcpCommuncationSpi.setLocalPort(int) doc method. If local port is occupied, then SPI will automatically increment the port number until it can successfully bind for listening. GridTcpCommuncationSpi.setLocalPortRange(int) doc configuration parameter controls maximum number of ports that SPI will try before it fails. Port range comes very handy when starting multiple grid nodes on the same machine or even in the same VM. In this case all nodes can be brought up without a single change in configuration.
The following configuration parameters can be used to configure GridTcpCommuncationSpi:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setLocalAddress(String) doc |
Sets local host address for socket binding. |
Yes |
Any available local IP address. |
setLocalPort(int)doc |
Sets local port for socket binding. |
Yes |
47100 (specified in GridTcpCommunicationSpi.DFLT_PORT doc) |
setLocalPortRange(int)doc |
Controls maximum number of local ports tried if all previously tried ports are occupied. |
Yes |
100 (specified in GridTcpCommunicationSpi.DFLT_PORT_RANGE doc) |
setTcpNoDelay(boolean)doc |
Sets value for TCP_NODELAY socket option. Each socket will be opened using provided value. |
Yes |
true (specified in GridTcpCommunicationSpi.DFLT_TCP_NODELAY doc) |
setConnectTimeout(long)doc |
Sets connect timeout used when establishing connection with remote nodes. |
Yes |
1000 (specified in GridTcpCommunicationSpi.DFLT_CONN_TIMEOUT doc) |
setIdleConnectionTimeout(long)doc |
Sets maximum idle connection timeout upon which a connection to client will be closed. |
Yes |
30000 (specified in GridTcpCommunicationSpi.DFLT_IDLE_CONN_TIMEOUT doc) |
setMaxOpenClients(int)doc |
Sets the maximum count of simultaneously open clients per remote node. |
Yes |
1 (specified in GridTcpCommunicationSpi.DFLT_MAX_OPEN_CLIENTS doc) |
setSelectorsCount(int)doc |
Sets the count of selectors te be used in TCP server. |
Yes |
Default count of selectors equals to the count of processors in system. (specified in GridTcpCommunicationSpi.DFLT_SELECTORS_CNTdoc) |
setDirectBuffer(boolean)doc |
Switches between using NIO direct and NIO heap allocation buffers. Although direct buffers perform better, in some cases (especially on Windows) they may cause JVM crashes. If that happens in your environment, set this property to false. |
Yes |
true |
setMessageThreads(int)doc |
Number of threads responsible for receiving messages. |
Yes |
20(specified in GridTcpCommunicationSpi.DFLT_MSG_THREADS doc) |
GridTcpCommunicationSpi is used by default and should be explicitly configured only if some SPI configuration parameters need to be overridden.
From code:
1 2 3 4 5 6 7 8 9 10 11 12 | GridTcpCommunicationSpi commSpi = new GridTcpCommunicationSpi();
// Override local port.
commSpi.setLocalPort(4321);
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default communication SPI.
cfg.setCommunicationSpi(commSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 10 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="communicationSpi">
<bean class="org.gridgain.grid.spi.communication.tcp.GridTcpCommunicationSpi">
<!-- Override local port. -->
<property name="localPort" value="4321"/>
</bean>
</property>
...
</bean>
|
Default Implementation
If no communication SPI is provided in configuration by default GridTcpCommunicationSpi is used.
20.2.3. Usage
Here is an example of how to send and receive messages using public Griddoc API:
1 2 3 4 5 6 7 8 9 10 11 12 13 | Grid grid = GridFactory.getGrid();
grid.addMessageListener(new GridMessageListener() {
/**
* @see GridMessageListener#onMessage(UUID,Serializable)
*/
public void onMessage(UUID nodeId, Serializable msg) {
System.out.println("Received message: " + msg);
}
));
// Send message to itself.
grid.sendMessage(grid.getLocalNode(), "TEST");
|
20.2.4. Configuration
GridCommunicationSpi is provided in Grid configuration and passed into GridFactorydoc at startup. By default GridTcpCommunicationSpi is used. You can configure a different communication SPI implementation as follows:
1 2 3 4 5 6 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Configure grid to use TCP communication layer.
cfg.setCommunicationSpi(new GridTcpCommunicationSpi());
GridFactory.start(cfg);
|
Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.
20.3. Collision SPI
20.3.1. Overview
GridCollisionSpidoc SPI allows to regulate how grid jobs get executed when they arrive on a destination node for execution. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation:
-
All jobs can proceed in parallel.
-
Jobs can be sequenced i.e., only one job can execute in any given point of time.
-
Only certain number or types of grid jobs can proceed in parallel.
-
Job may proceed based on some time based events.
Every time a new job arrives, it gets placed on waiting queue and it is up to collision SPI to either reject or activate an waiting job, or cancel an active job, or do nothing. Generally, collision SPI gets invoked in the following cases:
-
A new job has arrived.
-
An existing job has finished.
-
A node metrics update has been received.
-
Collision SPI implementation has called GridCollisionExternalListener.onExternalCollision() doc to force collision resolution.
To summarize, collision SPI provides developer with ability to use custom logic in determining how grid jobs should be scheduled and executed on a destination grid node.
|
|
Note that collision SPI only controls job execution - it does not control task execution. So if you have a case where a node only emits tasks, but does not execute jobs, then collision SPI will never be invoked on that node. |
Job Rejection
If job is canceled while waiting for execution (job is on waiting list and execution has not started yet), then it will be rejected and GridJobResultdoc which is passed into GridTask.result(GridJobResult, List<GridJobResult>) doc method will contain GridExecutionRejectedExceptiondoc. You can access this exception by calling GridJobResult.getException()doc method.
|
|
Automatic Failover Note, that if you use any of GridTask adapters, rejected jobs will be automatically failed over to another node.
By default, jobs get automatically failed over only in case of job rejection or a node failure). |
Job Cancellation
If job is canceled after it already was scheduled to execute, then GridJob.cancel()doc method will be called on it (this method will also call Thread.interrupt() on the executing thread automatically). In this case cancellation is used as a notification to a job that it should stop executing. Just like with Java thread interruption, it is ultimately up to a job to finish executing and return result to caller. Your GridTask.result(GridJobResult, List<GridJobResult>) http://www.gridgain.com/javadoc40E/org/gridgain/grid/GridTask.html#result(org.gridgain.grid.GridJobResult, java.util.List)[doc] implementation should decide if job result is acceptable and whether the job should be failed over to another node or not.
GridCollisionExternalListener
This listener is set on collision SPI for notification of external collision events (e.g. job stealing). Once grid receives such notification, it will immediately invoke collision resolution.
GridGain uses this listener to enable job stealing from overloaded to underloaded nodes in GridJobStealingCollisionSpi. However, you can also utilize it, for instance, to provide time based collision resolution. To achieve this, you most likely would mark some job by setting a certain attribute in job context for a job that requires time-based scheduling and set some timer in your collision SPI implementation that would wake up after a certain period of time. Once this period is reached, you would notify this listener that a collision resolution should take place. Then inside of your collision resolution logic, you would find the marked waiting job and activate it.
Note that most collision SPI’s may not have external or time-based collisions. In that case, they should simply ignore this method and do nothing when listener is set.
20.3.2. Built-in Implementations
GridFifoQueueCollisionSpi
GridFifoQueueCollisionSpidoc allows a certain number of jobs in first-in first-out order to proceed without interruptions. All other jobs will be put on waiting list until their turn.
Note that if parallelJobsNumber doc configuration parameter is not set, then this SPI will allow all concurrent jobs to proceed without interruptions. Make sure to set parallelJobsNumber doc parameter to enforce an upper limit for a maximum number of concurrent jobs that can proceed without interruptions. For example, to have only one job proceed at a time, set parallelJobsNumber parameter to 1.
The following configuration parameters can be used to configure GridFifoQueueCollisionSpi doc:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setParallelJobsNumber(int) doc |
Sets upper limit for a number of jobs that will proceed without interruptions. |
Yes |
95 |
As any GridGain SPI, GridFifoQueueCollisionSpidoc SPI can be configured either directly from code or from Spring configuration file. Here is an example of GridFifoQueueCollisionSpidoc SPI configuration from code:
1 2 3 4 5 6 7 8 9 10 11 12 | GridFifoQueueCollisionSpi colSpi = new GridFifoQueueCollisionSpi();
// Execute all jobs sequentially by setting parallel job number to 1.
colSpi.setParallelJobsNumber(1);
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default collision SPI.
cfg.setCollisionSpi(colSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.fifoqueue.GridFifoQueueCollisionSpi">
<property name="parallelJobsNumber" value="1"/>
</bean>
</property>
...
</bean>
|
GridJobStealingCollisionSpi
GridJobStealingCollisionSpidoc supports job stealing from over-utilized nodes to under-utilized nodes. This SPI is especially useful if you have some jobs within task complete fast, and others sitting in the waiting queue on slower nodes. In such case, the waiting jobs will be stolen from slower node and moved to the fast under-utilized node.
The design and ideas for this SPI are significantly influenced by Java Fork/Join Framework authored by Doug Lea and planned for Java 7. GridJobStealingCollisionSpidoc took similar concepts and applied them to the grid (as opposed to within VM support planned in Java 7).
Quite often grids are deployed across many computers some of which will always be more powerful than others. This SPI helps you avoid jobs being stuck at a slower node, as they will be stolen by a faster node. In the following picture when Node3 becomes free, it steals Job13 and Job23 from Node1 and Node2 respectively.
|
|
Usage
Note that this SPI must always be used in conjunction with GridJobStealingFailoverSpi doc. The responsibility of Job Stealing Failover SPI is to properly route stolen jobs to the nodes that initially requested (stole) these jobs. The GridJobStealingCollisionSpidoc maintains a counter of how many times a jobs was stolen and hence traveled to another node, and it will not allow a job to be stolen if this counter exceeds a certain threshold. The threshold value is configured in GridJobStealingCollisionSpidoc. Keep in mind that collision resolution happens on job executing nodes (workers), and failover happens on task-initiating node (master). So, if you have a case where a group of nodes is responsible only for sending tasks (masters) and another group is responsible for executing jobs (workers), it should be sufficient to configure GridJobStealingFailoverSpidoc on master nodes only and GridJobStealingCollisionSpi doc on worker nodes only. You should also take a look at setStealingEnabled(boolean) and setStealingAttributes(Map) configuration properties as they also allow you to control which nodes participate in job stealing. |
|
|
Disable Job Stealing Use GridJobStealingDisableddoc annotation to
disable job stealing and make sure that the jobs get executed exactly on the node they were mapped to.
If job fails on the selected node it will be failed over as usual according to the configured failover policy
in Failover SPI. |
The following configuration parameters can be used to configure GridJobStealingCollisionSpi doc:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setActiveJobsThreshold(int) doc |
Sets number of jobs that are allowed to be executed in parallel on this node. Node that this attribute may be different for different grid nodes as stronger nodes may be able to execute more jobs in parallel. |
Yes |
95 |
setWaitJobsThreshold(int) doc |
Sets wait jobs threshold. If number of jobs in the waiting queue goes below this threshold, then implementation will attempt to steal jobs from other, more over-loaded nodes. Note this value may be different (but does not have to be) for different nodes in the grid. You may wish to give stronger nodes a smaller waiting threshold so they can start stealing jobs from other nodes sooner. |
Yes |
0 |
setMessageExpireTime(long) doc |
Message expire time configuration parameter. If no response is received from a busy node to a job stealing request, then implementation will assume that message never got there, or that remote node does not have this node included into topology of any of the jobs it has. In any case, job steal request will be resent (potentially to another node). |
Yes |
1,000 ms (1 second) |
setMaximumStealingAttempts(int) doc |
Sets maximum number of attempts for a single job to be stolen. Once a job reaches this threshold, not more attempts will be made by other nodes to steal it. Note that this attribute should be the same on all nodes. |
Yes |
5 |
setStealingEnabled(boolean) doc |
Enables/disables job stealing on this node. |
Yes |
true |
setStealingAttributes(Map) doc |
Enables stealing to/from only nodes that have given attributes set. |
Yes |
empty map |
As any GridGain SPI, GridJobStealingCollisionSpidoc SPI can be configured either directly from code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | GridJobStealingCollisionSpi spi = new GridJobStealingCollisionSpi();
// Configure number of waiting jobs
// in the queue for job stealing.
spi.setWaitJobsThreshold(10);
// Configure message expire time (in milliseconds).
spi.setMessageExpireTime(500);
// Configure number of active jobs that are allowed to execute
// in parallel. This number should usually be equal to the number
// of threads in the pool (default is 100).
spi.setActiveJobsThreshold(50);
// Configure maximum stealing attempts.
spi.setMaximumStealingAttempts(10);
// Enable stealing.
spi.setStealingEnabled(true);
// Set stealing attribute to steal from/to nodes that have it.
spi.setStealingAttributes(Collections.singletonMap("node.segment", "foobar"));
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default Collision SPI.
cfg.setCollisionSpi(spi);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="100"/>
<property name="waitJobsThreshold" value="0"/>
<property name="messageExpireTime" value="1000"/>
<property name="maximumStealingAttempts" value="10"/>
<property name="stealingEnabled" value="true"/>
<property name="stealingAttributes">
<map>
<entry key="node.segment" value="foobar"/>
</map>
</property>
</bean>
</property>
...
</bean>
|
GridPriorityQueueCollisionSpi
GridPriorityQueueCollisionSpidoc allows a certain number of jobs with highest priority to proceed without interruptions. All other jobs will be put on waiting list until their turn. Job priority is retrieved from job priority attribute. If no priority has been assigned to a job (job priority attribute was not found), then default priority of 0 is used.
Note that if parallelJobsNumber doc configuration parameter is not set, then this SPI will allow all concurrent jobs to proceed without interruptions. Make sure to set +parallelJobNumber doc parameter to enforce an upper limit for a maximum number of concurrent jobs that can proceed without interruptions. For example, to have only one job with highest priority execute at a time, you should set parallelJobsNumber parameter to 1.
Here is an example of a grid tasks that uses priority collision SPI is configured. Note that priority collision resolution is absolutely transparent to the user and is simply a matter of proper grid configuration. Also, priority may be defined only for task (it can be defined within the task, not at a job level). All split jobs will be started with priority declared in their owner task.
This example demonstrates how urgent task may be declared with a higher priority value. Priority SPI guarantees (see its configuration in example below, where number of parallel jobs is set to 1) that all jobs from MyGridUrgentTask will most likely be activated first (one by one) and jobs from MyGridUsualTask with lowest priority will wait. Once higher priority jobs complete, lower priority jobs will be scheduled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | public class MyGridUrgentTask extends GridTaskSplitAdapter<Object, Object> {
public static final int SPLIT_COUNT = 5;
@GridTaskSessionResource
private GridTaskSession taskSes = null;
@Override
protected Collection<? extends GridJob> split(int gridSize, Object arg) throws GridException {
...
// Set high task priority (note that attribute name is used by the SPI
// and should not be changed).
taskSes.setAttribute("grid.task.priority", 10);
Collection<GridJob> jobs = new ArrayList<GridJob>(SPLIT_COUNT);
for (int i = 1; i <= SPLIT_COUNT; i++) {
jobs.add(new GridJobAdapter<Integer>(i) {
...
});
}
...
}
}
|
and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | public class MyGridUsualTask extends GridTaskSplitAdapter<Object, Object> {
public static final int SPLIT_COUNT = 20;
@GridTaskSessionResource
private GridTaskSession taskSes = null;
@Override
protected Collection<? extends GridJob> split(int gridSize, Object arg) throws GridException {
...
// Set low task priority (note that attribute name is used by the SPI
// and should not be changed).
taskSes.setAttribute("grid.task.priority", 5);
Collection<GridJob> jobs = new ArrayList<GridJob>(SPLIT_COUNT);
for (int i = 1; i <= SPLIT_COUNT; i++) {
jobs.add(new GridJobAdapter<Integer>(i) {
...
});
}
...
}
}
|
The following configuration parameters can be used to configure GridPriorityQueueCollisionSpi:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setDefaultPriority(int)+ doc |
Sets default priority used if job does not have job priority attribute set in the context. |
Yes |
0 |
setParallelJobsNumber(int) doc |
Sets upper limit for a number of jobs that will proceed without interruptions. |
Yes |
95 |
setPriorityAttributeKey(String) doc |
This key will be used to look up job priorities from job context (GridJobContext.getAttribute(String) doc method). |
Yes |
grid.job.priority |
As any GridGain SPI, GridPriorityQueueCollisionSpi doc can be configured either directly from code:
1 2 3 4 5 6 7 8 9 10 11 12 | GridPriorityQueueCollisionSpi colSpi = new GridPriorityQueueCollisionSpi();
// Execute all jobs sequentially by setting parallel job number to 1.
colSpi.setParallelJobsNumber(5);
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default collision SPI.
cfg.setCollisionSpi(colSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 8 9 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.priorityqueue.GridPriorityQueueCollisionSpi">
<property name="parallelJobsNumber" value="5"/>
</bean>
</property>
...
</bean>
|
Default Implementation
If no collision SPI is provided in configuration by default GridFifoQueueCollisionSpi is used.
20.3.3. Configuration
GridCollisionSpidoc is provided in grid configuration passed into GridFactorydoc at startup. You can configure a different collision SPI implementation as follows:
1 2 3 4 5 6 7 8 9 10 11 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
GridFifoQueueCollisionSpi colSpi = new GridFifoQueueCollisionSpi();
// Limit number of parallel jobs.
colSpi.setParallelJobsNumber(10);
// Configure your own collisioin SPI.
cfg.setCollisionSpi(colSpi);
GridFactory.start(cfg);
|
Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.
20.4. Failover SPI
20.4.1. Overview
Starting with GridGain 2.1 you can provide multiple instances of Failover SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTaskdoc implementaiton.
GridFailoverSpidoc SPI provides developer with ability to supply custom logic for handling failed execution of a grid job. Failover is triggered when method GridTask.result(GridJobResult, List)doc returns GridJobResultPolicy.FAILOVERdoc policy indicating that the result of job execution must be failed over. Job execution can fail for a number of reasons:
-
Job execution threw an exception (this condition has to be handled by user explicitly).
-
Job returned bad result (this condition has to be handled by user explicitly).
-
Node on which job was executing left topology, crashed, or stopped (failover is handled by default in GridTaskAdapterdoc).
-
Job was rejected before it got a chance to execute, while still on waiting list (failover is handled by default in GridTaskAdapterdoc).
In all cases failover SPI takes failed job (as failover context) and list of all grid nodes and produces another node on which the job execution will be retried. It is up to failover SPI to make sure that job is not mapped to the node it failed on. The failed node can be retrieved from GridFailoverContext.getJobResult().getNode()doc method.
|
|
Note that for any job spawned by a task, failover SPI will be invoked only on the node that initiated the task (obviously, it cannot be invoked on failed node). |
20.4.2. Built-in Implementations
GridAlwaysFailoverSpi
GridAlwaysFailoverSpidoc which always reroutes a failed job to another node. Note, that at first an attempt will be made to reroute the failed job to a node that was not part of initial split for a better chance of success. If no such nodes are available, then an attempt will be made to reroute the failed job to the nodes in the initial split minus the node the job is failed on. If none of the above attempts succeeded, then the job will not be failed over and null will be returned.
The following configuration parameters can be used to configure GridAlwaysFailoverSpi
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setMaximumFailoverAttempts(int) doc |
Sets maximum number of attempts to execute a failed job on another node. This parameter is available starting with GridGain 2.0 |
Yes |
5 |
As any GridGain SPI, GridAlwaysFailoverSpidoc can be configured either directly from code:
1 2 3 4 5 6 7 8 9 10 11 12 | GridAlwaysFailoverSpi failSpi = new GridAlwaysFailoverSpi();
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override maximum failover attempts.
failSpi.setMaximumFailoverAttempts(5);
// Override default failover SPI.
cfg.setFailoverSpi(failSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<bean class="org.gridgain.grid.spi.failover.always.GridAlwaysFailoverSpi">
<property name="maximumFailoverAttempts" value="5"/>
</bean>
...
</bean>
|
GridJobStealingFailoverSpi
GridJobStealingFailoverSpidoc needs to always be used in conjunction with GridJobStealingCollisionSpi. When GridJobStealingCollisionSpidoc receives a steal request and rejects jobs so they can be routed to the appropriate node, it is the responsibility of this GridJobStealingFailoverSpidoc SPI to make sure that the job is indeed re-routed to the node that has sent the initial request to steal it.
GridJobStealingFailoverSpidoc knows where to route a job based on the GridJobStealingCollisionSpi.THIEF_NODE_ATTR doc job context attribute (see GridJobContextdoc). Prior to rejecting a job, GridJobStealingCollisionSpidoc will populate this attribute with the ID of the node that wants to steal this job. Then GridJobStealingFailoverSpidoc will read the value of this attribute and route the job to the node specified.
If failure is caused by a node crash, and not by steal request, then this SPI behaves identically to GridAlwaysFailoverSpi, and tries to find the next balanced node to fail-over a job to.
|
|
Usage
GridJobStealingFailoverSpidoc must always be used in conjunction with GridJobStealingCollisionSpidoc. Please refer to GridJobStealingCollisionSpi documentation for more information. Keep in mind that collision resolution happens on job executing nodes (workers), and failover happens on task-initiating node (master). So, if you have a case where a group of nodes is responsible only for sending tasks (masters) and another group is responsible for executing jobs (workers), it should be sufficient to configure GridJobStealingFailoverSpidoc on worker nodes only and GridJobStealingCollisionSpidoc on master nodes only. |
The following configuration parameters can be used to configure GridJobStealingFailoverSpidoc:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setMaximumFailoverAttempts(int) doc |
Sets maximum number of attempts to execute a failed job on another node. |
Yes |
5 |
As any GridGain SPI, GridAlwaysFailoverSpidoc can be configured either directly from code:
1 2 3 4 5 6 7 8 9 10 11 12 | GridJobStealingFailoverSpi failSpi = new GridJobStealingFailoverSpi();
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override maximum failover attempts.
failSpi.setMaximumFailoverAttempts(5);
// Override default failover SPI.
cfg.setFailoverSpi(failSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
<property name="maximumFailoverAttempts" value="5"/>
</bean>
...
</bean>
|
GridNeverFailoverSpi
GridNeverFailoverSpidoc which never fails over. This implementation never fails over a failed job by always returning null out of GridFailoverSpi.failover(GridFailoverContext, List) doc method.
This SPI has no configuration parameters.
As any GridGain SPI, GridNeverFailoverSpidoc can be configured either directly from code:
1 2 3 4 5 6 7 8 9 | GridNeverFailoverSpi failSpi = new GridNeverFailoverSpi();
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default failover SPI.
cfg.setFailoverSpi(failSpi);
// Start grid.
GridFactory.start(cfg);
|
or from Spring configuration file:
1 2 3 4 5 6 7 | <bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="failoverSpi">
<bean class="org.gridgain.grid.spi.failover.never.GridNeverFailoverSpi"/>
</property>
...
</bean>
|
Default Implementation
If no failover SPI is provided in configuration by default GridAlwaysFailoverSpi is used.
20.4.3. Configuration
GridFailoverSpidoc is provided in grid configuration passed into GridFactorydoc at startup. You can configure a different failover SPI implementation as follows:
1 2 3 4 5 6 7 8 | GridConfigurationAdapter cfg = new GridConfigurationAdapter();
GridNeverFailoverSpi failSpi = new GridNeverFailoverSpi();
// Configure your own failover SPI.
cfg.setFailoverSpi(failSpi);
GridFactory.start(cfg);
|
Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.