http://www.gridgain.com/images/logo/logo.png

GridGain is a software middleware that enables development of high performance compute and data intensive distributed applications for real-time Big Data processing.

This book provides an in-depth knowledge on how to use GridGain software.

About this Book

This book is a current work in progress by GridGain team. Chapters are written often not in a logical order and until this book is finished we apologize for this inconvenience.

1. Introduction

http://www.gridgain.com/images/faces.gif

First of all, the whole GridGain team thanks you for picking up this book and devoting your time to know more about GridGain project. Although this book is being written primarily by Nikita Ivanov and Dmitri Setrakyan - all of us here at GridGain are pitching in with reviews, tests, proof-reading, examples and creative ideas.

GridGain project has been an amazing journey for all of us to this point and as you read these lines we are continuing our work on adding new features and improving existing ones, fixing bugs (happens to the best of us) and keep thinking on how to make GridGain more enjoyable and productive to use.

GridGain open source project started in the spring of 2005 with just Nikita and Dmitriy working on it in their spare time. We’ve managed to get our first official release out in the summer of 2007. In three years since then - in the late 2010 (when this introduction is written) - our software is now starting every 10 seconds around the globe and undoubtedly is one of the most popular distributed programing frameworks in JVM ecosystem - so we must be doing something right.

This is even more exciting for us since GridGain Systems is an engineering company first and foremost. We remain small as we believe in small "surgical" teams and every member of our company still writes code (some of us less so as we need to travel, speak and write books like this one - which we enjoy greatly). We’ve visited more than 50 conferences and Java Use Groups around the globe in the last 3 years to talk about GridGain - and we are grateful to each and everyone of you who came up to our talks. That was the only "marketing" that we could afford but hour and a half was usually enough to convince folks to try GridGain out. In fact, where else you could see a full-fledged MapReduce application running on multiple nodes written from scratch in front of your eyes in less than 5 minutes?

We want to thank you again and we hope that you’ll find this book useful and effective guide for discovering GridGain.

1.1. What is this book about?

This book is about how to use GridGain software to develop innovative distributed compute and data intensive applications that run on any managed infrastructure - from a simple laptop on which this book was written to a large grids and all types of clouds. When developing with GridGain and reading this book you can use both Java, Scala or Groovy programming languages. At the time of this writing - GridGain was the only distributed computing middleware with native Scala support.

This book does not replace API manual references, and you would still need to consult them from time to time for up to date method signatures, parameter description, etc. As we are always saying in our project - documentation is the code too and we pay great deal of attention to API References (Javadoc, Scaladoc and Groovydoc) - in fact we have one of the best organized and maintained code level documentation among any relevant projects.

This book is short considering its subject - and it is on purpose. We strongly believe that one of the main reason for slow adoption of grid and cloud computing in the last decade was over-complication and unnecessary "dramatization" of entire subject. In fact, the original idea behind GridGain came from rather unproductive experience developing application with Globus toolkit - innovative piece of software for early 90s but awfully over-engineered and out of place by the turn of the century with rapid advancements in server-side JVM-based programming.

Note We’ve designed this book in a way that you can take a first pass over it during the weekend and have a pretty good grasp on all major APIs and concepts. As you continue to work with GridGain you’ll be coming back to specific chapters for more details and to refresh on some of the less obvious parts. This is a perfectly normal way to read this book and we highly encourage it.

As you discover in this book the distributed computing is not necessarily complex or unwieldy as it may seem from the outside - in fact, most of its concepts are readily familiar for the most of you. Where it was usually getting complicated in the past is the tooling and framework support. That was a sore state of the affairs for a long time - most of engineering community just accepted that the distributed programming (grid and cloud computing included) just has to be "involved" because…. it is so, right?

In reality - it’s largely inaccurate. With the right tooling and framework support the distributed computing can be relatively simple and very productive. One of the main ideas in this book is to try to convince you that it is so. Every month or so we at GridGain receive email or a forum post where someone relates his or her experience of downloading GridGain during the day and by the midnight having the first MapReduce application running on Amazon EC2. By the time you finish this book it won’t seem like an exaggeration and you will have your first application running on EC2 much, much quicker.

This book covers GridGain starting with version 3.0 and provides in-depth manual for all three main technologies that are tightly integrated into GridGain cloud application platform (as of October 2010 GridGain is the only distribute middleware that provides all three technologies in the same platform - let alone for Java, Scala and Groovy languages):

  • computational grids (a.k.a. MapReduce)

  • data grid (a.k.a. distributed caching)

  • zero deployment with auto-scaling (a.k.a. elasticity on clouds)

Each of these main topics is covered in depth with plenty of examples in both Java, Scala and Groovy.

All in all, this book’s character perfectly reflects on what we think modern distributed programming should be - simple, effective and amazingly productive.

That is if you use GridGain…

1.2. About Authors…

http://www.gridgain.com/images/nivanov_dsetrakyan.png

This book is written primarily by Nikita Ivanov and Dmitriy Setrakyan. As always we encourage to contact us directly should you have any questions about this book or about GridGain. You can reach Nikita at nivanov@gridgain.com and Dmitriy at dsetrakyan@gridgain.com.

Both Nikita and Dmitriy have over 30 years of combined experience in distributed programming mostly in Java (some MPI C/C++ back in 90s - ouch) and now Scala. Nikita and Dmitriy were at the beginning of the project and to this day coordinate and lead most of the development work.

They both write code almost every day for GridGain despite heavy travel and frequent speaking engagements. And when they don’t write code - you’ll find them rooting for San Jose Sharks - the favorite hockey team of GridGain.

There are many way to stay in touch with GridGain project:

2. Overview

In this chapter we’ll lay down some of the basic ideas about grid and cloud computing and how GridGain fits into it. The goal of this chapter is to make sure we are on the same page with you, the reader, as far as fundamentals of grid and cloud computing (and you won’t believe how far apart we can be on these…)

2.1. What is GridGain?

In a nutshell - GridGain is a JVM-based middleware software that enables the development of compute and data intensive High Performance Distributed Applications. Applications developed with GridGain can scale up on any infrastructure - from a single Android device to a large cloud.

GridGain provides two major areas of functionality:

  • Compute Grids

  • In-Memory Data Grids

What is GridGain?

High Performance Cloud Computing

or

GridGain = (Java + Scala + Groovy) * (Compute Grid + In-Memory Data Grid)

On top of that it provides the multitude of surrounding technologies many of which are frequently used by our clients on their own.

With GridGain your applications can:

  • Work in a zero-deployment mode.

  • Scale up or down based on demand.

  • Cache distributed data in data grid.

  • Co-locate data and computations.

  • Run sql queries against cached data.

  • Store and query JSON objects.

  • Speed up task using MapReduce processing.

  • Use distributed thread pools.

  • Distribute the workload on the grid.

  • Use distributed queues and atomics.

  • Effectively exchange messages.

  • Auto-discover all grid resources.

  • Execute closures on the grid.

  • Grid-enable Java, Groovy and Scala code.

  • … and much more

2.2. Why Compute and In-Memory Data Grid?

Compute and In-Memory Data Grid act as two main axiomatic technologies for a modern distributed programming. They are fundamental because they solve two underlying problems faced by any distributed system:

  • distribution of computations

  • distribution of data

I always like to provide this analogy: every computing device - from Turing machine to the latest iPod - contains memory and a processing unit. Think about it… memory and the processing form the foundation of our computing capabilities. And so is the ability to distribute computations and data form the foundation of distributed programming.

Axiomatic Technologies

Compute and In-Memory Data Grid technologies are fundamental to any distributed systems as they address two underlying principles we use to gain scalability in the distributed context:

  • parallelization of computations

  • parallelization of data storage & access

And just like in late 1960s we’ve had first "system on the chip" where memory and processing units were finally integrated and combined on the same chip providing for cheaper, more energy efficient and much faster overall systems - GridGain has pioneered integrated middleware that combines compute and in-memory data grids in one cohesive and integrated distributed middleware software. This has resulted in similar benefits of simplified programming model, easier applicability and unified configuration and management.

Compute and in-memory data grids are the key topics in this book and we’ll talk a lot more about these two in the following chapters.

2.3. Why High Performance and Cloud Computing?

You noticed in the previous chapter that we call GridGain as a software middleware for developing High Performance Cloud Computing application. But why we focus on High Performance and what do we mean by that? And what does it have to do with Cloud Computing?

Important
GridGain Philosophy
The term High Performance Cloud Computing really came in the 2010 after almost 5 years of GridGain development. We believe it reflects perfectly the design goals that we originally had. Many, if not all, of GridGain’s features, designs and approaches stem from these goals.

Let’s talk about Cloud Computing first.

2.3.1. Cloud Computing

Despite all the buzz about cloud computing we believe strongly that from the software development perspective the cloud computing is almost synonymous with a traditional distributed programming.

In fact, that “almost” above accounts simply for a fact that unlike the traditional data centers, grids and clusters of the last decade clouds offer more fine grained resources virtualization and more management options. In a nutshell - your application development principles remain largely the same - but you have more options and more choices in how your application is deployed and how it utilizes available computing resources. Rest of the parallel distributed programming challenges of the last 25 years remain fully intact.

Note
10 Years Ago…
While ten years ago the most you should have accounted for was a new server coming up in your local grid - today you need to be prepared for not just a new server but an extra CPUs or extra disk storage or extra RAM appearing for your application that is snapshot and migrated potentially half way across the globe (just look at RackSpace Flavors, for example).

Still, the absolutely majority of the problems and challenges you are facing today while developing distributed software systems coalesce around parallelization of computing and data high availability in the distributed context - both are which are absolutely critical for any scalable distributed software system.

So, when we say Cloud Computing we mean Distributed Computing plus a few new important details.

Important
When We Say "Cloud Computing".
Cloud Computing = Distributed Computing + Data Center Virtualization

2.3.2. High Performance (HP)

High Performance aspect is equally interesting. When we present about GridGain on the conferences we inevitably get asked about this… why High Performance and Cloud in the same sentence?

The answer is very simple: not every cloud application needs to be high performance. In fact, most of today’s cloud applications (i.e. the application that are deployed on the clouds) are not high performance.

Note
Cloud Applications
Most of today cloud applications (i.e. application deployed in the clouds) are not high performance.

We use the term High Performance to specifically categorize applications that use distribution as means for processing parallelization, i.e. to achieve the scalability and/or performance that is theoretically unattainable on a single processing unit.

On the other hand, distributed applications that are not High Performance use cloud deployment as more convenient or economical deployment option without much need for improved scalability or performance.

Note
High Performance vs. Not High Performance

The distinction is very important:

  • HP applications use (cloud) distribution to achieve the scalability and/or performance that is theoretically unattainable on a single processing unit.

  • Non-HP applications use (cloud) distribution as more convenient or economical deployment option.

Naturally, some HP applications use cloud deployment because of convenience and economy as well.

So, tieing it all together GridGain is a High Performance Cloud Application platform because:

  1. It is designed to dramatically simplify the development of distributed software systems, including those that are deployed in the clouds

  2. It is designed to provide extreme scalability and performance for the software systems in the distributed context

2.3.3. Real-Time Cloud Applications

For another take on High Performance Cloud Application look at this blog entry I wrote in the middle of 2011:

Real-Time - A New Era of Cloud Applications

There’s a significant shift that has been happening in the last 12 months for many, if not all, BigData and BigCompute cloud applications - a shift to real-time processing.

Note This shift is nothing short of tectonic change and it is disrupting many software design approaches that are utilized today.

Now, when we talk about real-time processing we, of course, mean a near real-time (nR/T) since nothing can be really real-time in JVM world. Essentially, anything that can be processed within a reasonable user response time expectation (typically no longer than a coupe of seconds) can be considered a real-time for enterprise applications.

…Many analysts first got a hunch of this change when Google decided to drop a batch-oriented MapReduce design towards more real-time approach in their search implementations with what they call a Streaming MapReduce. Facebook followed earlier this year with dumping Hadoop-like processing in favor of different design that would have finally allowed them to tackle real-time performance.

Now, why all the fuss?

Fundamentally, the answer is pretty simple. First, just look around at devices and services you use every day: your TV, your iPhone or Android, Google or Bing, Facebook and Twitter, eBay and Amazon… Apart from slow internet connections what was the last time you needed to wait for 10 or even 5 seconds to get your result?

Your TV switches program instantly, Google and Bing return search results within a few seconds at most, almost all of the apps on iPhone and Android work in real-time (or it seems so), eBay processes your bids seemingly in real-time, and Amazon can put suggestions for you to purchase instantly. So as everyday users of these devices and services we are accustomed to instant response or… a real-time capabilities of these services.

However, when we apply the same expectation to today’s enterprise and business applications the picture is very different. And while delays in consumer devices and services lead to mostly frustrations - the delays in business applications often lead to broken business processes and significant revenue loss. Just a few real-life examples we at GridGain have witnessed:

  • In insurance industry many complex products cannot be currently priced or quoted on the spot (i.e. while having customer on the phone) because they require compute and data intensive processing and are usually done overnight. Sales reps have to hang up on customer and promise him to call back with the numbers next day (or worse - send in a letter).

    Up to 30% of customers lost due to this awkward process.

  • In investment banks and hedge funds automated or algorithmic trading is often done on models that are regenerated overnight or even less frequently - typically as part of pre-trade activity. Options and futures are prime examples… If market conditions change beyond the model’s parameters from pre-trade the auto-trading may be stopped all together since models are no longer valid - hence the loss of the revenue. What’s even worse is that less than critical deviation on the market are not accounted in rigid models and revenue is lost still even if trading continues.

Note Quite simply - inability to maintain complex quantitative financial models live in real-time is the main reason for this obvious hole in otherwise highly effective financial world.

But how do you implement complex business algorithms in real-time?

The answer is the ability to massively parallelize the business algorithm in such a way that its processing happens entirely in memory and can linearly scale up (and down) on demand. I call it a PDC theorem as in " Processing, Data, and Co-Location".

The idea behind PDC is pretty simple. Real-time response in a highly distributed system is not achievable unless the following 3 rules are followed:

  1. Processing must be distributable for in-memory computation

  2. Data storage must be distributable (i.e. partitioned) for in-memory storage

  3. Co-location must be ensured between processing and data units to provide locality of remote operations

Few important notes:

  • We, of course, are talking about business or perceptual real-time (a.k.a. Near Real-Time or nRT) and not about hardware real-time. Perceptual real-time response is not well defined but you can conceptually visualize it as the time the user of the system willing to wait for the response that he or she expects right away… In most cases it means few seconds or less. In rarer cases like FOREX trading, for example, the real-time would mean microseconds.

  • It is critically important that your processing supports algorithmic parallelization. Not all tasks can be parallelized and therefore not all tasks can be optimized for real-time processing. However, many of the typical business and social graph tasks can be split into multiple sub-tasks executing in parallel – and therefore are trivially parallelizable.

  • Data have to be partitioned and stored in-memory. Any outside calls to get data from NoSQL storage, file systems like HDFS, or traditional SQL storage renders any real-time attempts useless in most cases. This is one of the most critical design element and it is often overlooked. In other words – in no time the remote processing should escape the boundaries of the local JVM it is executing on.

  • Co-location of the processing and data (a.k.a affinity-based routing referring to the fact that should be an affinity between the computation and the data this computation needs) is the main mechanism to ensure that there is no noise data transfer between remote nodes where a task is being processed. Such unnecessary data transfer will violate the locality principle of the remote operations making real-time processing often unachievable.

It’s also quite obvious that PDC theorem doesn’t guarantee the real-time processing – it merely states these three rules are necessary but not enough on their own for a real-time response. Latencies of the atomic remote operations will often dictate whether or not real-time response is achievable in practice.

It is interesting to note that combination of PDC and CAP theorems really defines the fundamentals of high performance distributed programming today.

We at GridGain have been working on real-time BigData and BigCompute processing for several years now. These ideas led to develop the first middleware that natively combines both Compute Grid and In-Memory Data Grid into one product - making an ideal middleware software to build real-time cloud applications.

Note
GridGain
Using GridGain you can easily build systems that span 100s and 1000s of nodes while maintaining all necessary data cached in-memory and all computational processing fully parallelized and co-located.

2.4. Grid and Cloud Computing

To start off this sub-chapter I’ll tell you one story that happened to me few years ago.

I was presenting about GridGain at Java User Group at Dayton, Ohio. For those of you who don’t know - Dayton, OH is essentially a "sleeping quarters" for Wright-Patterson Air Force Base, one of the largest military research facilities in the world. It employees almost 30,000 people and headquarters massive Air Force Institute of Technology and Air Force Research Laboratory just to boot and it is known to have one of the largest super-computing centers in the world too. So, I’ve had few people from the base on my presentation…

Note Wright-Patterson Air Force Base is rumored to be the center of UFO research or so many people believe after Nevada’s Area 51 was closed few decades ago…

After the presentation I’ve chatted with some folks and conversation drifted into general topic of grid computing and what we all understand by that (and by relatively new back then concept of cloud computing). It was a real surprise to me (to say the least) that I got three very different answers from three guys working on the base - essentially working in one big company.

One answer was that grid is really nothing new alluding to parallel Fortran of almost 50 years old, another one was more in line with common understanding of grid computing being just a new re-tooling of traditional parallel processing, and yet another answer was that the whole thing is just a hype and multi-core CPUs will displace it all together in 5-10 years max.

I remember on my flight back I was a bit perplexed by how far apart those guys were while working side by side in albeit large organization and not just on technicality - but on fundamental view of distributed programming (grid or cloud - doesn’t matter). Starting with my next presentation onward I’ve made a rule to always state what I believe about grids and clouds to at least make sure we have common frame of reference. Whether or not you agree with - is another question…

So, here’s my take. I don’t particularly like terms grid and cloud computing. There’s nothing that resembles a "grid" in grid computing and obviously there’s nothing that is performed in the "cloud" when it comes to cloud computing. Both marketing terms "grid computing" and "cloud computing" represent slight variations of traditional distributed programming.

Note Put in more canonical form, if you have more than one computing resource working on the same problem in parallel - you have a grid and you are engaged in grid computing. If these computing resources (all or in part) are virtualized and available to you on demand - they represent a cloud and you are doing cloud computing.

In the nutshell - that’s it.

As you can see the difference between grid computing and cloud computing is only in representation of computing resources which in most cases should be irrelevant. The lines between grids and clouds are getting blurrier by the day and we use both terms interchangeably throughout this book (as long as context is clear).

It is, however, important to know the difference and new challenges that cloud infrastructure brings to the table for you as a software developer.

2.5. IaaS, PaaS, and SaaS

Only at the pick of the hype around the cloud computing can you get a chapter named like that…

Nowadays these terms are thrown often without much regard or understanding and while IaaS and SaaS are somewhat well defined, the PaaS is something that poorly defined if at all. Let us try to define these terms and see where GridGain fits in the picture.

Note Understanding this nomenclature is not essential for everyday usage of GridGain and most of the concepts in GridGain stay clear from these high level marketing terms. Yet - a cursory look is worth while and it will help you navigate the plethora of marketing literature that surrounds the cloud computing today.

Picture below provide basic overview of how GridGain related to IaaS and PaaS:

http://www.gridgain.com/images/iaas_paas.png

2.5.1. IaaS - Infrastructure As a Service

IaaS stands for "Infrastructure As a Service". IaaS is often (wrongly) synonymous with cloud computing. It essentially means providing virtualized computing resources as a services. Think of Amazon EC2, for example.

Note And who would ever thought that a nascent online book seller and one of the few dotcom survivors would spearhead the revolution that is so much bigger that just an online retailing - a revolution that is radically changing the way we think about information systems!

Amazon had its own data center and sometime ago decided to earn extra money by renting out their often unused computing capacity. So, they put a hardware virtualization (like VMs from VMWare or Citrix) on their servers and exposed the management of these VMs via Web browser so that anyone could create an account and start managing the VM instances.

In a nutshell - that’s all there’s to it.

It is important to note that IaaS (or clouds) can be public, private or hybrid. Public clouds are based on infrastructure that is publicly available, i.e. IaaS provides generally gives access to its data center to anyone. Private clouds are built by individual organizations for their internal use. And hybrid clouds exhibit both types of behavior.

While public and private clouds are usually a physical infrastructures with difference being who gets the access to the clouds - the hybrid clouds are almost always a virtual clouds. Similar the Virtual Private Networks (VPN), the virtual clouds are created on top of one or more physical public and private clouds and provide its end users seamless cloud/IaaS transparency. These hybrid clouds are often created for business applications by either PaaS or software middleware like GridGain.

Note Clouds can be public, private, and hybrid. While public and private clouds are usually physical infrastructures - the hybrid clouds are always virtualized clouds built via software on top of one or more physical public or private clouds. Analogy between hybrid clouds and VPN is almost one-to-one.

There are plenty of IaaS provides all following in steps with Amazon all providing different sets of functionality and different twists. A quick look at Amazon AWS offerings will show how complex and diverse it has become in recent years.

It is also a strong sign that hardware virtualization and services around it will be rapidly advancing. We are just few years away from being able to add an extra core to our application or acquire extra network bandwidth capacity on demand for our system’s pick time and scale it down the moment it doesn’t need it anymore.

These capabilities will undoubtedly change the way we develop our applications.

2.5.2. PaaS - Platform As a Service

PaaS stands for "Platform As a Service". PaaS essentially provides an abstraction over various IaaS providers and adds additional services. Additional services mostly consist of some set of deployment and provisioning services aiming at supporting application multi-tenancy, i.e. ability to host multiple applications in secure isolation on the same VM or a set of VMs (don’t confuse it with Java VM).

Note
VMs vs. Java VMs

Throughout this book we’ll use terms VM to denote hardware virtualization virtual machine (VM). To denote Java Virtual Machine we’ll use term JVM.

The problem with PaaS is that no one has a precise definition of what PaaS really is… Its definition is largely based on specific vendor capabilities. There is, however, one clear trait of PaaS: it abstracts out its users from worrying about specifics of various IaaS providers and differences in their operations and functionality.

PaaS also sprung out the notion of DevOps - a symbiosis of application development and traditional IT functions. It is often said that PaaS provides abstraction over IaaS and DevOps services.

Note PaaS provides abstraction over IaaS and DevOps services.

Most of the PaaS vendors today (early 2011) concentrate mainly on providing deployment and application provisioning services. PaaS from VMWare/Spring, Google AppEngine, CloudBees and RedHat/JBoss, for example, do exactly that. They all allow you to take your whole application and through a serious of manual steps move or deploy it onto IaaS infrastructure with some limited, if any, scale out functionality.

PaaS as a technology today is in its very early stages. It is clear that PaaS as a concept and technology will likely see the most amount of changes in the coming years - and by the time you read this book some of these changes may significantly affect your understanding of what PaaS can do for you.

Note IaaS abstracts out data center and exposes it as a service.
PaaS abstracts out IaaS providers and adds DevOps.

2.5.3. SaaS - Software As a Service

IaaS stands for "Software As a Service". Surprisingly for most casual observers, SaaS has relatively nothing to do with cloud computing or IaaS and PaaS specifically. Essentially, if you run your "application" in the browser - it is SaaS application.

That’s it.

Historically, SaaS came from ASP (Application Service Provider) businesses and it shares almost everything with ASP (except for more catchy name). Interestingly enough SaaS was the first "as a service" abbreviation long before IaaS and PaaS came to light. But when hardware virtualization and surrounding services became popular, "as a service" moniker was a logical progression for providing computing Infrastructure and Platforms "as a service".

2.5.4. How GridGain Fits?

By looking at the picture few paragraphs above you can see that GridGain can easily work directly with IaaS (like Amazon AWS) or through the PaaS. In fact, GridGain is completely independent from either PaaS or IaaS - it can work without any specific cloud or grid or cluster infrastructure.

This ability, this lightweight approach, is one of the key design advantages of GridGain. It provides you, the developer, exactly the same services whether you run GridGain on a simple Android device, a laptop, few servers, small grid or a large cloud.

So, you can select the desired DevOps approach for your application and GridGain will happily support it!

2.6. License

GridGain is dual-licensed:

  • GridGain Community Edition is free open-sourced and licensed under GPL version 3.

  • GridGain Enterprise Edition is closed-sourced and commercially licensed.

EULA (end-used license agreement) for Enterprise Edition is available at GridGain root installation folder after you install GridGain and it is also available on download page on the http://www.gridgain.com website.

GridGain System also provides OEM, Enterprise an Academic licenses with further details available upon request at sales@gridgain.com

Important
GridGain 1.x-2.x
Note also that previous version of GridGain 1.x-2.x were licensed under LGPL.

Note that throughout the book we don’t directly distinguish between features available in Enterprise and Community Edition. In cases where it is important we make a note.

2.7. Support

The best way to get a free support on GridGain software is to dip into our active community with wealth of information on our free support forum: http://jive.gridgain.org. This forum is closely monitored by GridGain System’s engineers and we try our best to provide free support there when applicable.

GridGain Systems, Inc., as a company behind GridGain project, also provides full spectrum of commercial services around GridGain software including:

  • Commercial subscription for Community and Enterprise editions

  • Consulting and professional services

  • OEM licensing

  • "Bronze", "Silver" and "Gold" levels of support

  • GridGain Training seminars

When it comes to training and support - we at GridGain have a very simple philosophy: we do our own heavy lifting. We believe that we are the best people to support our own software and who is it better to learn about GridGain from but the people who develop it daily?

All information about services provided by GridGain Systems can be found at http://www.gridgain.com/services.html

3. Taste of GridGain

Before we start digging into nitty-gridy details of GridGain functionality let’s quickly look at what we can accomplish in 10-15 minutes. We’ll create one application in Java and one in Scala utilizing Compute and In-Memory Data Grids.

Let’s see how quickly we can do both.

Note
Installation

We have a whole chapter dedicated to installation. However, installation of GridGain is rather trivial - and if you haven’t done it already here is a quick 3 steps:

  • Download GridGain ZIP archive from http://www.gridgain.com/downloads.shtml

  • Unzip ZIP archive into installation folder in your system

  • Set GRIDGAIN_HOME environment variable to point to installation folder

You are done!

Note
Unix

For the rest of this chapter (and most of this book) we will assume the JetBrain IDEA 10 and Unix environment like Unix, Linux or Mac OSX. Most of the steps and instructions apply almost verbatim to Eclipse, Emacs or NetBeans running on Windows with obvious changes to paths and certain project management capabilities of IDEs.

Now - the first step when developing distributed application is to have a… grid. With GridGain you can have it anywhere but for the purpose of this example we will create one right on the same computer where you will be running the main example.

Note
Multiple GridGain Nodes

One the coolest capability of GridGain is its ability to run multiple GridGain nodes on the same computer or even… inside the same JVM. Think about it - you can launch the entire cloud in a single JVM and enjoy local debugging of your application while it is running in the local virtualized cloud. Now - that’s pretty powerful and that’s exactly how we at GridGain to debug and test most of our complex internal distributed logic.

To have a local grid we are going to have two GridGain nodes running standalone and the third node will be embedded into our applications that we will develop. When our application starts - it will join the grid (i.e. join the topology) making the grid of three nodes.

Open the command shell and assuming you are in GRIDGAIN_HOME folder just type this:

$ bin/ggstart.sh

If everything is fine (you set GRIDGAIN_HOME environment variable properly and you have Java installed) you will see the output similar to this:

[11:52:50]   _____     _     _______      _         ____   ____
[11:52:50]  / ___/____(_)___/ / ___/___ _(_)___    |_  /  / __/
[11:52:50] / (_ // __/ // _  / (_ // _ `/ // _ \  _/_ <_ /__ \
[11:52:50] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[11:52:50]
[11:52:50]  ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[11:52:50]                ver. x.x.x-DDMMYYYY
[11:52:50]  Copyright (C) 2005-2011 GridGain Systems, Inc.
[11:52:50]
[11:52:50] Quiet mode.
[11:52:50]   ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[11:52:50] << Enterprise Edition >>
[11:52:50] Daemon mode: off
[11:52:50] Language runtime: Java Platform API Specification ver. 1.6
[11:52:50] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49113, auth: off, ssl: off)]
[11:52:50] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[11:52:50] (!) SMTP is not configured - email notifications are off.
[11:52:50] (!) Cache is not configured - data grid is off.
[11:52:53] Topology snapshot [nodes=1, CPUs=4, hash=0xB12A5F18]
[11:52:54] License info:
[11:52:54]     Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[11:52:54]     License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[11:52:54]       ^--License limits [<none>]
[11:52:54] System info:
[11:52:54]     JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[11:52:54]     OS: Mac OS X 10.7.1 x86_64, nivanov
[11:52:54]     VM name: 4837@NIKITA-IVANOVs-MacBook-Pro.local
[11:52:54] Local ports used [TCP:8080 TCP:47100 UDP:47200 TCP:47300]
[11:52:54] GridGain started OK
[11:52:54]   ^-- [grid=default, nodeId8=a26743ce, order=1315939970778, CPUs=4, addrs=[192.168.1.103]]
[11:52:54] ZZZzz zz z...

Start another command shell and type the same command again:

$ bin/ggstart.sh

This time you get almost identical output with few important changes:

[12:34:09]   _____     _     _______      _         ____   ____
[12:34:09]  / ___/____(_)___/ / ___/___ _(_)___    |_  /  / __/
[12:34:09] / (_ // __/ // _  / (_ // _ `/ // _ \  _/_ <_ /__ \
[12:34:09] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[12:34:09]
[12:34:09]  ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[12:34:09]                ver. x.x.x-DDMMYYYY
[12:34:09]  Copyright (C) 2005-2011 GridGain Systems, Inc.
[12:34:09]
[12:34:09] Quiet mode.
[12:34:09]   ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[12:34:09] << Enterprise Edition >>
[12:34:09] Daemon mode: off
[12:34:09] Language runtime: Java Platform API Specification ver. 1.6
[12:34:09] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49114, auth: off, ssl: off)]
[12:34:09] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[12:34:09] (!) SMTP is not configured - email notifications are off.
[12:34:09] (!) Cache is not configured - data grid is off.
[12:34:10] Node JOINED [nodeId8=a26743ce, addr=[192.168.1.103], CPUs=4] 1
[12:34:12] Topology snapshot [nodes=2, CPUs=4, hash=0xC287D25B] 2
[12:34:14] (!) Jetty failed to start (retrying every 3000 ms). Another node on this host?
[12:34:14] License info:
[12:34:14]     Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[12:34:14]     License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[12:34:14]       ^--License limits [<none>]
[12:34:14] System info:
[12:34:14]     JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[12:34:14]     OS: Mac OS X 10.7.1 x86_64, nivanov
[12:34:14]     VM name: 5227@NIKITA-IVANOVs-MacBook-Pro.local
[12:34:14] Local ports used [TCP:47101 UDP:47200 TCP:47301]
[12:34:14] GridGain started OK
[12:34:14]   ^-- [grid=default, nodeId8=b8aac044, order=1315942449848, CPUs=4, addrs=[192.168.1.103]]
[12:34:14] ZZZzz zz z...

This output is a bit more interesting as it shows that both nodes discovered each other:

1 Event of node joining the topology
2 Snapshot of the topology showing total number of nodes and CPUs

So at this point we have two nodes running (they are simply idling since we are not processing anything). Notice how didn’t have specify any configuration properties or configure anything at all. Everything works out-of-the-box as expected.

Note
SPIs

As you will learn later in the book GridGain is composed of almost a dozen of different SPIs each providing pluggable kernel-level functionality. Two of these SPIs are discovery and communication SPIs that are responsible for maintaining distributed topology and exchanging the data between nodes.

When you start GridGain with default configuration (like we just did) it starts with default SPI implementations (IP-multicast discovery and TCP/IP-based communication respectively) - and they work perfectly in our case.

3.1. First GridGain Java App

Now that we have topology set we are going to switch to writing the actual code of our first application. Code examples below don’t have any dependencies on IDE - and you can follow up using any text editor of your choice.

The first app we are going to write would be computational MapReduce. It will calculate number of non-space characters in a given string by splitting the string into individual words, calculating word’s length on the remote nodes and aggregating results back.

We’ll use FP-based approach that GridGain natively support (even in Java):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import java.util.*;
import static org.gridgain.grid.GridClosureCallMode.*;

public class GridFunctionalMapReduceExample {
    public static void main(final String[] args) throws GridException {
        if (args.length == 1 && args[0].length() > 0)
            GridFactory.in(new GridInClosureX<Grid>() {
                @Override public void applyx(Grid g) throws GridException {
                    System.out.println("Length of input argument is " + g.reduce(
                        SPREAD,
                        GridFunc.<String, Integer>cInvoke("length"),
                        Arrays.asList(args[0].split(" ")),
                        GridFunc.sumIntReducer()
                    ));
                }
            });
    }
}

Not that there are many ways you can write this particular program in GridGain:

  • We can use direct Grid Task and Grid Jobs approach

  • We can use AOP-based grid enablement

  • We can GridGain 3.0 imperative APIs

  • Even using GridGain FP APIs there are different ways to code this program

FP-based approach above, however, yields probably the shortest program but can be initially confusing since Java isn’t really supporting FP natively. So let me explain step by step how this works.

Lines 1-4
We start by importing all necessary classes and constants into the scope.

Line 7
We define a main(…) method that will take input string.

Line 9
Method GridFactory.in(…) simply takes a closure and:

  • Starts the default GridGain node (custom configuration can be passed in as a parameter)

  • Executes passed in closure

  • Stop the GridGain node

So, essentially, it allows for a quick execution of the piece of code within context of a running GridGain node.

Line 9, 10
As a parameter to GridFactory.in(…) method call we are passing a newly created closure of type GridInClosureX<Grid>. The body of this closure will be executed within context of a running GridGain node.

Line 11-15
GridInClosureX<Grid> has one method applyx(Grid) which passed a Grid interface instance. Inside of applyx(Grid) we call reduce(…) method that performs MapReduce operation. It accepts four parameters:

  • Distribution mode. In our case we use GridClosureCallMode.SPREAD to spread the processing to all available nodes

  • A closure to execute on every remote node. We use utility method GridFunc.cInvoke(…) that creates closure via reflection based on the method name

  • A list of arguments that will be passed to each closure on the remote nodes

  • Reducing closure that takes results from the remote nodes and aggregates them into one final result. We, again, use pre-defined integer accumulator returned by GridFunc.sumIntReducer() method.

The logic of this computational MapReduce should be clear by now. We split input string by spaces into individual words, we then send every word to a remote node where a method length will be called on that word, and results of these calls will be returned back to reducer that will simply sum them up.

Now that we have a basic understanding of what is happening inside of this code let’s run this example. Depending on whether or not you are using IDE, Maven or Ant build you simply need to include gridgain.jar file that’s located in GRIDGAIN_HOME directory and all JARs under GRIDGAIN_HOME/libs folder to your classpath. Also, if you use IDEs - make sure that either environment variable GRIDGAIN_HOME is inherited by IDE process or system property with the same name GRIDGAIN_HOME is setup in your runtime configuration.

When it’s all set and done I’ve passed "GridGain is awesome" string my IDEA 10 runtime configuration and the got the following output in my IDEA output window:

/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -ea -DGRIDGAIN_HOME=...
[12:28:11]   _____     _     _______      _         ____   ____
[12:28:11]  / ___/____(_)___/ / ___/___ _(_)___    |_  /  / __/
[12:28:11] / (_ // __/ // _  / (_ // _ `/ // _ \  _/_ <_ /__ \
[12:28:11] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[12:28:11]
[12:28:11]  ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[12:28:11]                ver. x.x.x-DDMMYYYY
[12:28:11]  Copyright (C) 2005-2011 GridGain Systems, Inc.
[12:28:11]
[12:28:11] Quiet mode.
[12:28:11]   ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[12:28:11] << Enterprise Edition >>
[12:28:11] (!) SMTP is not configured - email notifications are off.
[12:28:11] (!) Cache is not configured - data grid is off.
[12:28:11] Daemon mode: off
[12:28:11] Language runtime: Java Platform API Specification ver. 1.6
[12:28:11] Remote Management [restart: off, REST: on, JMX (remote: off)]
[12:28:11] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[12:28:13] Node JOINED [nodeId8=b324d6c3, addr=[192.168.1.103], CPUs=4]
[12:28:15] Topology snapshot [nodes=2, CPUs=4, hash=0xCFFF5AA0]
[12:28:15] Node JOINED [nodeId8=e854d435, addr=[192.168.1.103], CPUs=4]
[12:28:15] Topology snapshot [nodes=3, CPUs=4, hash=0xF7C10287]
[12:28:15] License info:
[12:28:15]     Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[12:28:15]     License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[12:28:15]       ^--License limits [<none>]
[12:28:15] New version is available at www.gridgain.com: 3.2.1c.05082011
[12:28:15] System info:
[12:28:15]     JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[12:28:15]     OS: Mac OS X 10.7.1 x86_64, nivanov
[12:28:15]     VM name: 9791@NIKITA-IVANOVs-MacBook-Pro.local
[12:28:15] Local ports used [TCP:8080 TCP:47102 UDP:47200 TCP:47302]
[12:28:15] GridGain started OK
[12:28:15]   ^-- [grid=default, nodeId8=1cdf9436, order=1316028492427, CPUs=4, addrs=[192.168.1.103]]
[12:28:15] ZZZzz zz z...
Length of input argument is 16
[12:28:17] GridGain stopped OK [uptime=00:00:01:363]

As you can see the output is very similar to standalone nodes we’ve started few minutes ago. But in the end we have output of our MapReduce task which says:

Length of input argument is 16

correctly computing number of non-empty characters in input string "GridGain is awesome". Notice also we’ve had topology snapshot with three nodes (as expected). If you check other standalone nodes you will see the similar output to this:

[12:28:13] Node JOINED [nodeId8=1cdf9436, addr=[192.168.1.103], CPUs=4]
[12:28:13] Topology snapshot [nodes=3, CPUs=4, hash=0xF7C10287]
[12:28:17] Node LEFT [nodeId8=1cdf9436, addr=[192.168.1.103], CPUs=4]
[12:28:17] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]

indicating that when our application started we’ve had three nodes in the topology and when our application completed and stopped we were back to two nodes (i.e. two standalone nodes).

Note
Zero Deployment

Did you notice any deployment steps? Any Ant or Maven build? Did we create any JAR files to copy to remote nodes?

As you probably guessed - the answer is no. We didn’t need to do any of these awkward and expensive steps because GridGain sports pretty unique technology that allows it deploy necessary classes on-demand in a distributed fashion completely transparently to the developer. In fact - you just write the code as if it is completely local and GridGain will take care of proper distribution, versioning, class loading, etc.

Now, let’s advance our example. As you probably noticed we don’t see any evidence on remote nodes that any processing is happening there. In fact, by default GridGain starts in QUIET mode and most of the output is suppressed (if you need to start in normal mode - use -v flag for ggstart.sh script).

Let’s modify our example so that we’ll see what is being process and where:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import java.util.*;
import static org.gridgain.grid.GridClosureCallMode.*;

public class GridFunctionalMapReduceExample {
    public static void main(final String[] args) throws GridException {
        if (args.length == 1 && args[0].length() > 0)
            GridFactory.in(new GridInClosureX<Grid>() {
                @Override public void applyx(Grid g) throws GridException {
                    System.out.println("Length of input argument is " + g.reduce(
                        SPREAD,
                        new GridClosure<String, Integer>() {
                            @Override public Integer apply(String s) {
                                System.out.println("Calculating for: " + s);

                                return s.length();
                            }
                        },
                        //GridFunc.<String, Integer>cInvoke("length"),
                        Arrays.asList(args[0].split(" ")),
                        GridFunc.sumIntReducer()
                    ));
                }
            });
    }
}

We’ve commented out the reflection-based closure and added direct closure creation that prints out what string it is working on and returns its length. If we re-run our application we’ll now get the following output on three nodes:

Remote Node 1

[14:14:27] Node JOINED [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4]
[14:14:27] Topology snapshot [nodes=3, CPUs=4, hash=0x89DEB0E3]
Calculating for: is 1
[14:14:31] Node LEFT [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4]
[14:14:31] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]

Remote Node 2

[14:14:27] Node JOINED [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4]
[14:14:27] Topology snapshot [nodes=3, CPUs=4, hash=0x89DEB0E3]
Calculating for: awesome 1
[14:14:31] Node LEFT [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4]
[14:14:31] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]

Local node in IDE

[14:14:29] GridGain started OK
[14:14:29]   ^-- [grid=default, nodeId8=6e77be3c, order=1316034866664, CPUs=4, addrs=[192.168.1.103]]
[14:14:29] ZZZzz zz z...
Calculating for: GridGain 1
Length of input argument is 16 2
[14:14:31] GridGain stopped OK [uptime=00:00:01:237]
1 - That’s the output from our closure executing on remote nodes.
2 - That’s the output from the reduction step executing on the local (initiating) node.

Note that local node (i.e. the node running in IDE) performs calculation as well as performing the final reduction step. If we don’t want it to participate in the actual calculation and only perform the final reduction - we can simply change this line:

System.out.println("Length of input argument is " + g.reduce(

to this

System.out.println("Length of input argument is " + g.remoteProjection().reduce(
Note
Consider this…

Look at these 20 lines of code and consider that this application includes:

  • auto topology discovery

  • auto load balancing

  • distributed fail over

  • collision resolution

  • zero code deployment & provisioning

  • pluggable marshalling & communication

Pretty neat, right? And so just in about two dozens lines of code and 10 minutes we’ve got our first MapReduce application running.

3.2. First GridGain Scala App

Now - let’s move to In-Memory Data Grid application and we’ll use Scala for that, more specifically - Scalar, our Scala-based DSL for GridGain.

Note
Scalar DSL

The idea behind Scalar is to simply adopt Java-side APIs for usage in Scala. Scalar by design does not add any additional new functionality to GridGain but adopts Java APIs to Scala. This is a very important point to understand that there’s no additional or left out functionality when you are switching between Java and Scala - 100% of GridGain is available in both languages (nad natively so).

Note that GridGain also comes with Grover - Groovy++ DSL.

Since we are going to use In-Memory Data Grid in this example we need to restart our standalone nodes with enabled data grid. Note that by default there are no caches configured (for obvious performance reasons). GridGain comes with handy example configuration that comes with three caches configurated examples/config/spring-cache.xml.

You can stop existing nodes by simply Ctrl-C and then start them again using:

bin/ggstart.sh examples/config/spring-cache.xml

The output from the nodes is almost the same as in previous example with one notable change:

...
[13:28:05] Configured caches ['partitioned', 'replicated', 'local']
...

indicating that we now have three configured caches that are named based on their type.

Now that we have standalone nodes running with necessary configuration let’s turn to writing our application that will utilize the In-Memory Data Grid (we’ll use shorter term data grid going forward) side of GridGain. We are going to create an application that will populate data grid with a set of key-value pairs and then execute the set of closures where each closure will have an affinity with the specific key in the data grid - and therefore it will be co-located with the data for that key instead of just randomly be executed on some node in the grid.

Important
Affinity Co-Location

Affinity co-location is extremely important use case in real-time processing as it underpins the system design that can scale linearly regardless of the size of the data set.

Scalar-based (Scalar is GridGain’s DSL based on Scala) application looks pretty simple:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import org.gridgain.scalar.scalar
import scalar._
import org.gridgain.grid.cache.GridCache

object ScalarCacheAffinitySimpleExample {
    /** Number of keys. */
    private val KEY_CNT = 20

    def main(args: Array[String]) {
        scalar("examples/config/spring-cache.xml") {
            val c = grid$.cache[Int, String]("partitioned")

            populate(c) // Comment out on subsequent runs.
            colocate(c)
        }
    }

    private def populate(c: GridCache[Int, String]) {
        (0 until KEY_CNT).foreach(i => c += (i -> i.toString))
    }

    private def colocate(c: GridCache[Int, String]) {
        (0 until KEY_CNT).foreach(i =>
            grid$.affinityRun("partitioned", i,
                () => println("Co-located [key= " + i + ", value=" + c.peek(i) + ']'))
        )
    }
}

Even if you are not familiar with Scala - the code looks pretty self-explanatory. Let’s go line by line like we did for Java example above.

Line 1-3
Necessary imports including import for Scalar

Line 7
Defines number of keys we’ll be storing in the data grid and number of closures we’ll be executing later.

Line 10
Initializes Scalar with the same configuration file as we used for standalone nodes examples/config/spring-cache.xml. Note that initializing Scalar essentially means starting up the local node.

Line 11
We are getting an instance of the cache named partitioned (which is a partitioned cache named so for clarity). Cache is typed for Int keys and String values.

Line 13, 14
Calls functions populate() and colocate() that are defined later.

Line 18-20
Function populate() simply puts key/value pairs into data grid. Node that partitioned cache will store particular key/value pair on one of the nodes (potentially including the local one) as well as on one back up node. Since we have three nodes in the topology (remember - one local node and two standalone noes) - each key/value pair will be store on two nodes.

Note
Running Multiple Time

Note that if you run this example multiple time you need to comment out line 13 since we don’t need to override value that are already in the data grid. Don’t get confused here: even though we are stopping the local node when our application finished and all data that was stored on this node will be lost - the key/value pairs are duplicated on backup nodes (i.e. stored twice in the data grid).

When we start our application again the pre-loading process will optimally reshuffle the data from two existing nodes to new three nodes topology.

Note that number of backup nodes and details of pre-loading process are fully configurable.

Line 22-26
Function colocate() executes number of closures where each closure gets affinity co-located with some key in the data grid. Note that the closure itself simply prints the trace message and uses function peek() that gets the value only if it’s locally available - which should be since we are co-locating closure with the node where data is stored (so called master node).

Note
Affinity Co-Location

The colocate() function is the key functionality here. Look how simple it is to co-locate the computational logic (a closure) with the data this logic need to process (data in cache).

Let’s go ahead and start our application. Starting Scala application is no different than starting Java application (at least if you use IDEs). Local node running in IDEA prints out the following log (abbreviated):

[22:57:01]   _____     _     _______      _         ____   ____
[22:57:01]  / ___/____(_)___/ / ___/___ _(_)___    |_  /  / __/
[22:57:01] / (_ // __/ // _  / (_ // _ `/ // _ \  _/_ <_ /__ \
[22:57:01] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:57:01]
[22:57:01]  ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:57:01]                ver. x.x.x-DDMMYYYY
[22:57:01]  Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:57:01]
[22:57:01] Quiet mode.
[22:57:01]   ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:57:01] << Enterprise Edition >>
...
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
...
[22:57:05] ZZZzz zz z...
Co-located [key= 0, value=0]
Co-located [key= 1, value=1]
Co-located [key= 7, value=7]
Co-located [key= 10, value=10]
Co-located [key= 18, value=18]
[22:57:09] GridGain stopped OK [uptime=00:00:03:171]

and the remote nodes print:

Remote Node 1

[22:56:40]   _____     _     _______      _         ____   ____
[22:56:40]  / ___/____(_)___/ / ___/___ _(_)___    |_  /  / __/
[22:56:40] / (_ // __/ // _  / (_ // _ `/ // _ \  _/_ <_ /__ \
[22:56:40] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:56:40]
[22:56:40]  ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:56:40]                ver. x.x.x-DDMMYYYY
[22:56:40]  Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:56:40]
[22:56:40] Quiet mode.
[22:56:40]   ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:56:40] << Enterprise Edition >>
 ...
[22:56:42] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
 ...
[22:57:02] Node JOINED [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
Co-located [key= 2, value=2]
Co-located [key= 3, value=3]
Co-located [key= 4, value=4]
Co-located [key= 5, value=5]
Co-located [key= 11, value=11]
Co-located [key= 12, value=12]
Co-located [key= 13, value=13]
Co-located [key= 16, value=16]
[22:57:08] Node LEFT [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:08] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]

Remote Node 2

[22:56:40]   _____     _     _______      _         ____   ____
[22:56:40]  / ___/____(_)___/ / ___/___ _(_)___    |_  /  / __/
[22:56:40] / (_ // __/ // _  / (_ // _ `/ // _ \  _/_ <_ /__ \
[22:56:40] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:56:40]
[22:56:40]  ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:56:40]                ver. x.x.x-DDMMYYYY
[22:56:40]  Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:56:40]
[22:56:40] Quiet mode.
[22:56:40]   ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:56:40] << Enterprise Edition >>
 ...
[22:56:39] Topology snapshot [nodes=1, CPUs=4, hash=0xF15A46A8]
 ...
[22:56:42] Node JOINED [nodeId8=de5c9bb0, addr=[127.0.0.1], CPUs=4]
[22:56:42] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
[22:57:02] Node JOINED [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
Co-located [key= 6, value=6]
Co-located [key= 8, value=8]
Co-located [key= 9, value=9]
Co-located [key= 14, value=14]
Co-located [key= 15, value=15]
Co-located [key= 17, value=17]
Co-located [key= 19, value=19]
[22:57:08] Node LEFT [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:08] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]

As you can see the key/value pairs got distributed roughly equal (the more keys the better the distribution will be obviously). What’s also important to note is that we didn’t get any null as values proving the fact that co-location work (remember: we’ve used function peek() that only return locally stored value or null if value for given key is not stored locally).

All in all - these were two quick examples of Java and Scala based applications that demonstrate some of the basics of GridGain functionality. The following chapters in the book will explain why these are just a scratch on the surface…

4. Getting and Installing

Now that we’ve looked briefly at what GridGain can do let’s start… from the beginning: how to get GridGain software and how to install it.

Note
Screenshots
Note that most of the website screenshots will change by the time you read this book. However, you can easily navigate the website as it is right now as its main parts remained relatively the same.

4.1. Download

There are three ways how you can get GridGain:

We highly recommend to use the first method and simply download ZIP file from http://www.gridgain.com website. To do so - simply open http://www.gridgain.com in your favorite browser and locate the download link that usually on the right side:

http://www.gridgain.com/book/images/screenshot-31.png

Once you clicked on the download link you’ll be on download page and you’ll need to enter your name and email:

http://www.gridgain.com/book/images/screenshot-30.png

Keep in mind several things:

  • There are two editions available for the download - enterprise and community

  • Community Edition is licensed under GPLv3 and Enterprise Edition comes with evaluation license

  • There is a link on top of the page for past downloads that contains selected previous releases of GridGain

  • In the download table (see above) you can see date of the build, its version, and the link to Release Notes

There are six downloads (as of version 3.0.2):

  • Enterprise and Community for Windows

  • Enterprise and Community for Linux/Unix/Mac OS.

  • Amazon AMI images for Enterprise and Community editions

All downloads are simple ZIP files. ZIP files are versioned and clearly named to indicate for what OS family they are intended to.

4.1.1. Maven2

Maven repository available only for version 3.0.0c-RC1 and up.

If you decide to use Maven please keep in mind:

  • Only Maven2 repository currently available (as of version 3.0.2)

    • Maven3 is not supported yet.

  • Only community edition is available in our public repository

    • Enterprise edition can only be downloaded directly from http://www.gridgain.com website

    • Maven2 POM file is included with distribution

  • Only the main GridGain JAR file is available in Maven repository

    • Depending on your usage of GridGain you may need configuration files, working directly, etc. that won’t be created when using Maven to get GridGain.

To utilize our Maven repository you’ll need to make the following changes. In your POM file you need to add dependency for GridGain:

POM File
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<dependencies>
    .
    .
    .
    <dependency>
        <groupId>org.gridgain</groupId>
         <artifactId>gridgain</artifactId>
         <version>3.0.0c-rc1</version> <!-- CHANGE IT! -->
    </dependency>
</dependencies>

Make sure to properly change the version of the GridGain.

You will need to add GridGain repository to your POM file as well:

POM File
1
2
3
4
5
6
7
8
9
<repositories>
    .
    .
    .
    <repository>
        <id>gridgain</id>
        <url>http://www.gridgainsystems.com/maven2/</url>
    </repository>
</repositories>

Once you have it done - you are ready for Maven-based usage of GridGain.

Note
Internal Repository
We recommend to use internal Maven repositories for your projects if Maven is something you like to use. You can download GridGain as usual through www.gridgain.com website and deploy necessary files to your local repository for the rest of the team to use. This way you have full control on how GridGain is available via Maven for your particular project.

4.1.2. Versions

GridGain follows traditional rules on versioning and what specific version number means:

Version Description

X.X.1…9

Point release.
Usually contains bug fixes, documentation and example improvements.
It is backward compatible unless specified otherwise.

X.1…9.X

Mid-point release.
Usually contains bug fixes, documentation and example improvements.
Backward compatible only if specified

1…9.X.X

Major release.
Usually contains bug fixes, documentation and example improvements.
Backward compatible only if specified.

In general, we target one major release every 12 months and mid-point release every 6 months. Point releases are being cut as we see need to patch issues or provide hot bug fixes.

4.1.3. Supported Operating Systems

GridGain is actively developed and tested on three major operating systems:

http://www.gridgain.com/images/macos_logo_24x24.gif

Mac OS X

http://www.gridgain.com/images/win_logo_24x24.gif

Windows 7

http://www.gridgain.com/images/linux_logo_24x24.gif

Linux (Ubuntu & Fedora)

Being JVM-based software GridGain has minimal dependency on particular operating system (as long as Java is available for it). Most of the dependencies are in scripts. With every release of GridGain we thoroughly testing the software against the following version of operating systems:

  • Mac OS X 10.x

  • Linux Ubuntu (current active release)

  • Linux Fedora (current active release)

  • Windows XP/Vista/2007 (as of 3.0.2 version)

Note that we do not actively test against the following operating system but verified independently that GridGain 3.0 or later works stable and correct on them:

  • Solaris (current release)

  • HP-UX (current release)

  • Window 2003, Windows 2000

Important
Less Tested
GridGain is less tested on: Solaris, HP-UX, Window 2003, and Windows 2000.

In general, with extremely rare exceptions, GridGain will work out-of-the-box on any Windows or Linux/Unix-based system as long as Java 6 (and Scala 2.9 or later) is available on it.

4.1.4. Java, Scala and Groovy

As of version 3.0 GridGain requires Java 6. Note that GridGain 3.0 has not been tested with upcoming Java 7 as of May 2011.

Starting with version 3.0.9 GridGain requires Scala 2.9 or later (if Scala is used which is optional). Note that original release of GridGain 3.0.0 came before Scala 2.8 GA was released and was compatible only with Scala 2.7.

Starting with version 3.1.1 GridGain requires Groovy 1.8 and corresponding version of Groovy++ (if Groovy is used which is optional).

Keep in mind that you can develop with either Java, Scala, Groovy or any combination of thereof. Specifically, Scala is not required to develop with GridGain but some of the tools, like GridGain Visor - monitoring and interpreting tool in Enterprise Edition, use Scala REPL and therefore Scala is required for its usage.

Note also that as of GridGain 3.0.2 - none of the functionality in community edition explicitly require Scala or Groovy.

Note
Java, Scala and Groovy
As of version 3.1.1 GridGain requires Java 6, Scala 2.9 and Groovy 1.8.

As of November 2010 you can download both Java, Scala, and Groovy from:

Important
Java on Mac OS X
Note that Java download for Mac OSX may change its location as its development is shifting from Apple to Oracle as of November 2010.

4.2. Installation

Once you download whatever ZIP file your have selected - the installation process is rather trivial:

  • Unzip it to any location you prefer.

  • Set up GRIDGAIN_HOME environment variable pointing to installation folder

Note that installation does not perform any new-line translations and text files may have wrong new-lines depending on what OS installation is performed.

Unix/Linux/Mac OSX ZIP file has all Shell scripts with executable flag set so that they can be called directly.

Note
GRIDGAIN_HOME
Note that strictly speaking GRIDGAIN_HOME is not required for GridGain operation - and if you know that your setup won’t require it (explained later in the book) - you can skip it. If you are new to GridGain - it’s very advisable to set GRIDGAIN_HOME right after the unzipping the downloaded file.
Important
Trailing Spaces
Make sure there is no trailing \ in GRIDGAIN_HOME path.

4.2.1. Installing On Shared Location

One good practice for testing, staging or production setups is to install GridGain into shared location like a network share or shared hard drive. This way multiple grid nodes can share single configuration, libraries and working directory. This significantly simplifies management of GridGain installation in a distributed environment.

4.3. Uninstallation

Uninstalling GridGain is even simple than installing - you simply remove the GRIDGAIN_HOME folder where GridGain was installed. If it was configured to use paths outside of GRIDGAIN_HOME you will need to delete them too (if necessary).

4.4. Upgrading

Due to complexity of GridGain (mostly due to its distributed nature) we have decided not to provide incremental upgrade (or patching) capabilities. We recommend upgrading GridGain by cleanly uninstalling and installing a new upgrade version.

5. Configuration

5.1. Overview

GridConfigurationdoc interface defines grid runtime configuration. This configuration is passed to GridFactory.start(GridConfiguration)doc method. It defines all configuration parameters required to start a grid instance. Usually, a special class called "loader" will create an instance of this interface and call GridFactory.start(GridConfiguration)doc method to initialize GridGain instance.

Note, that absolutely every configuration property in GridConfigurationdoc is optional. You can simply create a new instance of GridConfigurationAdapterdoc, for example, and pass it to GridFactory.start(GridConfiguration)doc as is to start grid with default configuration. See GridFactorydoc documentation for information about default configuration properties used and more information on how to start grid.

The following configuration parameters can be used to configure grid node with GridConfigurationAdapter:

Setter Method Description Optional Default

setGridName(String)doc

Grid name.

Yes

null

setGridGainHome(String)doc

GridGain installation folder.

Yes

GRIDGAIN_HOME system property or environment variable.

setLocalHost(String)doc

System-wide local address or host for all GridGain components to bind to.

Yes

null

setNodeId(UUID)doc

Unique identifier for local node.

Yes

Random UUID.

setNetworkTimeout(long)doc

Maximum timeout in milliseconds for network requests.

Yes

5000ms

setLicenseUrl(String)doc

License URL different from the default location of the license file.

Yes

GRIDGAIN_HOME/gridgain-license.xml

setUserAttributes(Map<String,? extends Serializable>)doc

User specific attributes to attach to this node. Available via GridNode.getAttribute(String)doc method. Very useful for segmenting grid nodes into subgroups or identifying nodes based on certain property.

Yes

All System Properties and Environment Variables are set as node attributes automatically by GridGain.

setDaemon(boolean)doc

Daemon flag.

Yes

false

setIncludeProperties(String…)doc

Array of system or environment property names to include into node attributes.

Yes

All properties are included by default.

setIncludeEventTypes(int…)doc

Array of event types, which will be recorded by GridEventStorageManager. Note, that either the include event types or the exclude event types can be established.

Yes

All events are recorded by default.

setExcludeEventTypes(int…)doc

Array of event types, which will not be recorded by GridEventStorageManager. Note, that either the include event types or the exclude event types can be established.

Yes

All events are recorded by default.

setLifecycleBeans(GridLifecycleBean…)doc

Collection of lifecycle beans.

Yes

null

setLifeCycleEmailNotification(boolean)doc

Whether or not to enable lifecycle email notifications.

Yes

false

setDiscoveryStartupDelay(long)doc

Time in milliseconds after which a certain metric value is considered expired.

Yes

1 minute

setGridLogger(GridLogger)doc

Logger to use within grid.

Yes

GridLog4jLoggerdoc

setMarshaller(GridMarshaller)doc

Marshaller to use for serialization/deserialization of objects (available from ver. 2.1).

Yes

GridOptimizedMarshallerdoc

setDeploymentMode(GridDeploymentMode)doc

Deployment mode for task/query requests initiated from this node (available from ver. 2.1).

Yes

SHAREDdoc

setPeerClassLoadingEnabled(boolean)doc

Enables/disables peer class loading.

Yes

true

setPeerClassLoadingMissedResourcesCacheSize(int)doc

Specifies internal cache size for missed resources. If attempt to load a resource failed, then it will be cached, and following attempts will not make remote calls (available from ver. 2.1).

Yes

100

setP2PLocalClassPathExclude(List<String>)doc

List of packages in a system class path that should be to P2P loaded even if they exist locally.

Yes

null

setMetricsExpireTime(long)doc

Time in milliseconds after which a certain metric value is considered expired.

Yes

600000ms

setMetricsHistorySize(int)doc

Number of metrics kept in history to compute totals and averages.

Yes

10000

setMetricsLogFrequency(int)doc

Frequency of metrics log print out.

Yes

0, which means that metrics print out is disabled.

setExecutorService(ExecutorService)doc

Thread pool to use mainly for task and job execution.

Yes

GridThreadPoolExecutordoc with 100 threads.

setExecutorServiceShutdown(boolean)doc

Executor service shutdown flag.

Yes

true

setSystemExecutorService(ExecutorService)doc

Thread pool to use for processing job and task session asynchronous responses (available from ver. 2.1).

Yes

GridThreadPoolExecutordoc with 100 threads.

setSystemExecutorServiceShutdown(boolean)doc

System executor service shutdown flag.

Yes

true

setPeerClassLoadingExecutorService(ExecutorService)doc

Thread pool to use for processing peer class loading requests and responses (available from ver. 2.1).

Yes

GridThreadPoolExecutordoc with 20 threads.

setPeerClassLoadingExecutorServiceShutdown(boolean)doc

Peer class loading executor service shutdown flag.

Yes

true

setMBeanServer(MBeanServer)doc

MBean server for exposing GridGain MBeans.

Yes

The default MBean Server provided by JDK.

setSegmentationPolicy(GridSegmentationPolicy)doc

Segmentation policy.

Yes

STOPdoc

setSegmentationResolvers(GridSegmentationResolver…)doc

Segmentation resolvers.

Yes

null

setSegmentCheckFrequency(long)doc

Network segment check frequency.

Yes

10000ms

setWaitForSegmentOnStart(boolean)doc

Wait for segment on start flag.

Yes

true

setAllSegmentationResolversPassRequired(boolean)doc

All segmentation resolvers pass required flag.

Yes

true

setRestEnabled(boolean)doc

Flag indicating whether external REST access is enabled or not.

Yes

true

setRestJettyPath(String)doc

Path, either absolute or relative to GRIDGAIN_HOME, to JETTY XML configuration file.

Yes

null

setRestSecretKey(String)doc

Secret key to authenticate REST requests.

Yes

null, which means that authentication is disabled.

setSmtpHost(String)doc

SMTP host.

Yes

null, which disables sending emails.

setSmtpPort(int)doc

SMTP port.

Yes

25

setSmtpUsername(String)doc

SMTP username.

Yes

null

setSmtpPassword(String)doc

SMTP password.

Yes

null

setAdminEmails(String[])

Set of admin emails where email notifications will be set.

Yes

null

setSmtpFromEmail(String)doc

FROM email address for email notifications.

Yes

info@gridgain.com

setSmtpSsl(boolean)doc

Whether or not SMTP uses SSL.

Yes

false

setSmtpStartTls(boolean)doc

Whether or not SMTP uses STARTTLS.

Yes

false

setLocalEventListeners(Map<GridLocalEventListener, int[]>)doc

Pre-configured local event listeners.

Yes

null

setLoadBalancingSpi(GridLoadBalancingSpi…)doc

Fully configured instances of GridLoadBalancingSpi. Starting with GridGain 2.1 you can provide multiple instances of Load Balancing SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation.

Yes

GridRoundRobinLoadBalancingSpidoc

setCheckpointSpi(GridCheckpointSpi…)doc

Fully configured instances of GridCheckpointSpi. Starting with GridGain 2.1 you can provide multiple instances of Checkpoint SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation.

Yes

GridSharedFsCheckpointSpidoc

setCollisionSpi(GridCollisionSpi)doc

Fully configured instance of GridCollisionSpi.

Yes

GridFifoQueueCollisionSpidoc

setCommunicationSpi(GridCommunicationSpi)doc

Fully configured instance of GridCommunicationSpi.

Yes

GridTcpCommunicationSpidoc

setDeploymentSpi(GridDeploymentSpi)doc

Fully configured instance of GridDeploymentSpi.

Yes

GridLocalDeploymentSpidoc

setDiscoverySpi(GridDiscoverySpi)doc

Fully configured instance of GridDiscoverySpi.

Yes

GridMulticastDiscoverySpidoc

setEventStorageSpi(GridEventStorageSpi)doc

Fully configured instance of GridEventStorageSpi.

Yes

GridMemoryEventStorageSpidoc

setFailoverSpi(GridFailoverSpi…)doc

Fully configured instances of GridFailoverSpi. Starting with GridGain 2.1 you can provide multiple instances of Failover SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation.

Yes

GridAlwaysFailoverSpidoc

setTopologySpi(GridTopologySpi…)doc

Fully configured instances of GridTopologySpi. Starting with GridGain 2.1 you can provide multiple instances of Topology SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation.

Yes

GridBasicTopologySpidoc

setMetricsSpi(GridLocalMetricsSpi)doc

Fully configured instance of GridLocalMetricsSpi.

Yes

GridJdkLocalMetricsSpidoc

Some of the most commonly used configuration properties are explained in more detail below.

5.1.1. Grid Name

Use grid name configuration property whenever you would like to identify your grid by name. Usually, if you have only one grid node within your VM, you don’t have to configure grid name explicitly and use the default no-name grid node. However, if you start multiple grid node instances in the same VM, say for unit testing or debugging, then properly configuring grid name for every grid node instance will allow you to access multiple grid nodes by name via GridFactory.getGrid(String gridName)doc method.

5.1.2. User Attributes

User attributes allow you to attach various custom attributes to your nodes. This attributes can then be used to identify node topology for your task execution or load balancing, segmenting your grid into multiple sub-grids, etc… By default, GridGain will automatically attach or System and Environment properties to your node.

You can query node attributes practically from anywhere in your code, be that your task or job logic, or implementation of topology or load-balancing SPI’s. Simply get a handle on GridNode and check its attributes via GridNode.getAttribute(String)doc method.

5.1.3. Grid Logger

Configuring proper grid logger will allow you to integrate your logging with any environment. By default, GridLog4jLoggerdoc is used which gets its logging configuration from GRIDGAIN_HOME/config/default-log4j.xml.

Below is the list of supported loggers:

  • GridLog4jLoggerdoc - Log4j-based implementation for logging. This logger should be used by loaders that have prefer log4j-based logging. By default, GridGain will use this logger with configuration from GRIDGAIN_HOME/config/default-log4j.xml.

  • GridJavaLoggerdoc - Logger to use with Java logging. Implementation simply delegates to Java Logging.

  • GridJbossLoggerdoc - Logger to use in JBoss loaders. Implementation simply delegates to JBoss logging.

  • GridJclLoggerdoc - This logger wraps any JCL (Jakarta Commons Logging) loggers. Implementation simply delegates to underlying JCL logger. This logger should be used by loaders that have JCL-based internal logging (e.g., Websphere).

5.1.4. Grid Marshaller

Starting with GridGain 2.1 release you are able to configure different marshallers, and if needed provide your own. GridMarshallerdoc allows to marshal or unmarshal objects. It provides serialization/deserialization mechanism for all instances that are sent across network or are otherwise serialized.

GridGain provides the following GridMarshaller implementations:

  • GridJBossMarshallerdoc - this is the default marshaller used by GridGain. It used JBoss implementation of java.io.ObjectOutputStream for object serialization. All marshalled instances must implement java.io.Serializable.

  • GridJdkMarshallerdoc - this marshaller uses standard JDK java.io.ObjectOutputStream for object serialization.. All marshalled instances must implement java.io.Serializable.

  • GridXstreamMarshallerdoc - this marshaller uses Codehaus XStream for serialization of objects into XML. It does not require that marshalled instances implement java.io.Serializable, however, it performs slower than other marshaller implementations as XML is a verbose protocol.

  • GridOptimizedMarshallerdoc - Unlike GridJdkMarshaller, which is based on standard ObjectOutputStream, this marshaller does not enforce that all serialized objects implement java.io.Serializable. It is also generally much faster as it removes lots of serialization overhead that exists in default JDK implementation.

5.1.5. Executor Services

Starting with version 2.1, GridGain exposes configuration for 3 threads pools:

  • ExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for task and job executions. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 100 threads. Change this configuration parameter whenever you need to change the number of threads participating in GridTask/GridJob execution.

  • SystemExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for processing of asynchronous job and task session responses. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 100 threads. Change this configuration parameter whenever you set task session attributes frequently or feel that responses are not processed fast enough.

  • PeerClassLoadingExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for processing of all Peer Class Loading requests. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 20 threads. Change this configuration parameter whenever you feel that class-loading requests don’t get processed fast enough.

Tip Do not confuse executor services provided in configuration for thread pooling with grid-enabled executor service provided by GridGain.

5.1.6. Grid Lifecycle Beans

See Grid Lifecycle Beans documentation for information on how to specify lifecycle beans and examples.

5.1.7. SPIs - Server Provider Interfaces

Server Provider Interfaces allow you to configure virtually every aspect of GridGain, such as communication, discovery, topology and failover, load-balancing, etc… in LEGO-like fashion. For information on available SPI’s and their configuration refer to SPI’s documentation.

5.2. Specifying Different SPIs Per GridTask

Starting with GridGain 2.1 you can start multiple instances of Topology SPI, Load Balancing SPI, Failover SPI and Checkpoint SPI. If you do that, you need to tell a task which SPI to use (by default it will use the first SPI in the list).

Add @GridTaskSpisdoc annotation for your task to specify what SPIs it wants to use. If this annotation is omitted, then by default GridGain will pick the first corresponding SPI implementation from the array of SPIs provided in configuration.

This example shows how to configure different SPI’s for different tasks. Let’s assume that you have two worker nodes, Node1 and Node2. Let’s also assume that you configure Node1 to belong to SegmentA and Node2 to belong to SegmentB. Here is a sample configuration for Node1:

1
2
3
4
5
6
7
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    <property name="userAttributes">
        <map>
            <entry key="segment" value="A"/>
        </map>
    </property>
</bean>

Node2 configuration looks similar to Node1 with segment attribute set to B:

1
2
3
4
5
6
7
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    <property name="userAttributes">
        <map>
            <entry key="segment" value="B"/>
        </map>
    </property>
</bean>

Now, if you have Task1 and Task2 starting from some master node NodeM, you can easily configure Task1 to only run on SegmentA and Task2 to only run on SegmentB. Here is how configuration on master node NodeM would look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    <!--
        Topology SPIs. We have two named SPIs: One picks up nodes
        that have attribute "segment" set to "A" and another one sees
        nodes that have attribute "segment" set to "B".
    -->
    <property name="topologySpi">
        <list>
            <bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
                <property name="name" value="topologyA"/>
                <property name="filter">
                    <bean class="org.gridgain.grid.GridJexlNodeFilter">
                        <property name="expression" value="node.attributes['segment'] == 'A'"/>
                    </bean>
                </property>
            </bean>
            <bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
                <property name="name" value="topologyB"/>
                <property name="filter">
                    <bean class="org.gridgain.grid.GridJexlNodeFilter">
                        <property name="expression" value="node.attributes['segment'] == 'B'"/>
                    </bean>
                </property>
            </bean>
        </list>
    </property>
</bean>

Then your Task1 and Task2 would look as follows (note the @GridTaskSpis annotation):

1
2
3
4
@GridTaskSpis(topologySpi="topologyA")
public class GridSegmentATask extends GridTaskSplitAdapter<String, Integer> {
    ...
}

and

1
2
3
4
@GridTaskSpis(topologySpi="topologyB")
public class GridSegmentBTask extends GridTaskSplitAdapter<String, Integer> {
    ...
}

5.3. GridSpringBean

Grid Spring bean allows to bypass GridFactorydoc methods. In other words, this bean class allows to inject new grid instance from Spring configuration file directly without invoking static GridFactorydoc methods. This class can be wired directly from Spring and can be referenced from within other Spring beans. By virtue of implementing org.springframework.beans.factory.DisposableBean and org.springframework.beans.factory.InitializingBean interfaces, GridSpringBean automatically starts and stops underlying grid instance.

The following configuration parameters are optional:

  • Grid configuration (see setConfiguration(GridConfiguration)doc)

5.3.1. Spring Configuration Example

1
2
3
4
5
6
7
<bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean" scope="singleton">
    <property name="configuration">
        <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
            <property name="gridName" value="mySpringGrid"/>
        </bean>
    </property>
</bean>

Or use default configuration:

1
<bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean" scope="singleton"/>

5.3.2. Java Example

Here is how you may access this bean from code:

1
2
3
4
5
6
AbstractApplicationContext ctx = new FileSystemXmlApplicationContext("/path/to/spring/file");

// Register Spring hook to destroy bean automatically.
ctx.registerShutdownHook();

Grid grid = (Grid)ctx.getBean("mySpringBean");

5.4. Examples

GridConfiguration may be defined in code:

1
2
3
4
5
6
7
8
9
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default values for grid node.
cfg.setGridName("mygrid");

...

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file (default Spring configuration file can be found in GRIDGAIN_HOME/config/default-spring.xml file):

1
2
3
4
5
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    ...
    <property name="gridName" value="mygrid"/>
    ...
</bean>

5.5. AOP Configuration

In order to use annotation based @Gridifydoc AOP-based grid-enabling the following AOP configuration needs to be in place depending on which AOP implementation you choose to use. Note that you only need to pick one AOP implementation.

5.5.1. JBoss AOP

Standalone application

Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:

  • The following JVM configuration must be present (make sure to replace com.foo.bar with your domain package):

    • -javaagent:[path to jboss-aop-jdk50-4.x.x.jar]

    • -Djboss.aop.class.path=[path to gridgain.jar]

    • -Djboss.aop.exclude=org,com -Djboss.aop.include=com.foo.bar

  • The following JARs should be in a classpath:

    • javassist-4.x.x.jar

    • jboss-aop-jdk50-4.x.x.jar

    • jboss-aspect-library-jdk50-4.x.x.jar

    • jboss-common-4.x.x.jar

    • trove-1.0.x.jar

JBoss AOP with JBoss AS
  • Install JBoss AOP deployer.

    • Remove the jboss-aop-jdk50.deployer directory of "server/your_server_name/deploy" in your JBoss AS

    • Download the latest stable version of JBoss AOP (1.5.5 GA)

    • Unzip it and make sure that all directories were unzipped case-sensitive

    • Copy the appropriate jboss-aop-jdk50.deployer directory from your JBoss AOP installation to your "server/your_server_name/deploy"

    • Edit jboss-aop-jdk50.deployer/jboss-service.xml, setting "EnableLoadTimeWeaving" with a true value, like follows:

      1
      2
      3
      <attribute name="EnableLoadtimeWeaving">true</attribute>
      <attribute name="Exclude">java,javax,org,com,net,sun,oracle,EDU,antlr</attribute>
      <attribute name="Include">com.foo.bar</attribute>
      

      Make sure to replace com.foo.bar with your domain package. Also make sure to edit the exclude list if it does not have some packages that you would not like to weave.

    • Follow the instructions of the jboss-aop-jdk50.deployer/ReadMe.txt file

    • Copy pluggable-instrumentor.jar file (located in the lib-50 directory of your JBoss AOP installation) to the bin directory of your server

    • Edit your run.sh or run.bat to include -javaagent:pluggable-instrumentor.jar in the JAVA_OPTS

  • Deploy Gridgain as SAR.

    • Copy gridgain.sar directory from GRIDGAIN_HOME/config/jboss folder into your "server/your_server_name/deploy" folder.

    • Make sure to update classpath in gridgain.sar/META-INF/jboss-service.xml to point to all libs under GRIDGAIN_HOME and GRIDGAIN_HOME/libs.

Note
JBoss AOP and JSP
JBoss AOP CFLOW pointcut does not properly work JSP-compiled classes (it does not properly handle JSP classes on the stack). The workaround is to include pre-compiled JSP classes into your WAR file. Tomcat provides instructions on how to do that with JSPC here - Web Application Compilation.

5.5.2. AspectJ AOP

The following configuration needs to be applied to enable AspectJ byte code weaving:

  • JVM configuration should include: -javaagent:[GRIDGAIN_HOME]/libs/aspectjweaver-1.5.3.jar

  • Classpath should contain the [GRIDGAIN_HOME]/config/aop/aspectj folder.

5.5.3. Spring AOP

Spring AOP framework is based on dynamic proxy implementation and doesn’t require any specific runtime parameters for online weaving. All weaving is on-demand and should be performed by calling method GridifySpringEnhancer.enhance(Object) for the object that has method with Gridify annotation.

Note that since this method of weaving requires manual enhancing of participating classes, it is rather inconvenient in most cases, and AspectJ or JbossAOP are recommended over it. Spring AOP can be used in situations when code augmentation is undesired and cannot be used. It also allows for very fine grained control of what gets weaved.

BEA Weblogic AS

Weblogic application server does not support AspectJ and JBoss AOP officially and the only way to use AOP is a Spring AOP. One needs to enhance classes as described above using Spring AOP. See http://springide.org/blog/2006/05/24/implementing-jee-with-spring-and-weblogic for details.

6. Main Abstractions

This chapter will list some of the main concepts in GridGain that you need to understand to move forward. Most of them will be discussed in greater depth later on in the book - but it’s helpful to lay them out upfront so that you can follow up examples. This also gives you the bird view on GridGain architecture and API design.

Note Keep in mind that we don’t expect you to fully understand each topic below just yet - all of them will be discussed in-depth much later in the book. In fact, you can skim this chapter quickly - but we recommend at least that.

GridGain has several key abstractions that are essential for understanding pretty much everything else in GridGain. We’ll begin with them first.

6.1. GridNode Interface

GridNodedoc interface defines a logical grid node in the network topology. Note that a physical node (like a computer on the network) can have multiple logical grid nodes running on it. In fact, a single JVM can run multiple logical grid nodes - note that GridGain is the only software in the world allowing this unique capability.

GridNode interface has very concise API and deals only with a notion of a logical network endpoint, a node, in the topology: it has globally unique ID, node metrics, set of static attributes provided by the user and few other parameters.

Note
GridNode
GridNode has globally unique ID, set of static attributes provided by the user, node metrics and few other parameters.

The unique characteristic of GridGain is that it uses Peer-To-Peer (P2P) topology meaning that all nodes in GridGain are equal. There is no master or server nodes, and there are no worker or client nodes either. All nodes are equal from GridGain’s point of view - yet all these and any other roles can be assigned logically to the nodes.

This unique design gives GridGain tremendous flexibility: not only you are not limited to the master-worker mold - you can define any application specific roles and assign them to the nodes dynamically. More over, since these roles are logical, they can change and "migrate" from node to node as topology changes or based on your application logic.

GridNode interface is used primarily by internal kernal code and by discovery and communication SPI implementations and rarely used directly. GridRichNodedoc interface, its rich counterpart, is what used instead for majority of GridGain operations. More on this below.

6.2. Local Grid Node

Local grid node is an instance of GridNode interface that is instantiated in the local JVM runtime for a specific grid. In general, JVM process that runs GridGain runtime can have zero, one or more local grid nodes, but only one local node per specific grid’s topology.

6.3. Grid Topology

As the logical extension of the network endpoint - a grid node as defined above - we use the term topology throughout the GridGain documentation to reference a set of all logical grid nodes where each node "knows" every other nodes (in other words, topology is a fully connected graph of all grid nodes including the local node). We often refer to such topology as simply a grid.

Note
Topology is Associative
Note that the key characteristic of the topology is its associative property, i.e. the fact that each node "knows" every other node.

Depending on configured discovery SPI implementation (discussed later) the topology can be guaranteed to be consistent on all nodes at any point of time or be eventually consistent only. The eventual consistency means that all nodes will eventually get into fully consistent view but there a short time window where nodes can have a different view on the global topology. The guaranteed consistency is expensive to implement and it is optional in GridGain.

Note, however, that for data grid, for example, the configured SPI must provide consistent topology (i.e. support guaranteed discovery discussed later).

It is important to note that a single GridGain runtime can support any number of topologies or grids in the same time. Nodes from one topology have no knowledge about the nodes in other topologies. In such cases, single GridGain runtime (JVM process) will have several local grid nodes where each local node would belong to a different grid. Again, there is an important distinction between GridGain runtime (a JVM process) and a logical grid node running inside of that runtime.

Important
GridGain Runtime vs. Grid Node
There is an important distinction between GridGain runtime (a JVM process) and a logical grid node running inside of that runtime.

Note that we often refer to virtual sub-grid or sub-topology which is essentially just a subset of grid nodes from one topology. More on all that later.

6.4. GridProjection Interface

One of the major addition in GridGain 3.0 was introduction of a grid projection (and corresponding GridProjectiondoc interface. The important observation that led to this addition was the fact there is a large set of GridGain operations that can be defined uniformly on any arbitrary set of grid nodes.

Put it differently, each such operation can be performed on zero, one, two or any other number of grid nodes. For example, you can send a message to one, or more nodes in the grid. Just as you can listen for messages from one, two or any other number of nodes. And so on. To use functional programming terminology - these are the monadic operations defined on a set of grid nodes (which correspondingly makes GridProjection a monad).

Note
GridProjection is a Monad
GridProjection exposes monadic set of operations defined on an arbitrary non-empty set of grid nodes.

To logically group such operations the GridProjection interface is introduced and it defines all major operations in the GridGain that can be performed on a arbitrary set of nodes. You can think about grid projection as a specific view on topology. Projection can be static or dynamic and there are many ways how a projection can be defined.

As you will see throughout this book the elements of functional programming or functional API design are central to GridGain. You’ll discover that even when working with Java APIs you are dealing with functional constructs most of the time - even though Java is not a functional language to being with! This is one the unique sides of GridGain and it leads to extremely elegant and simple to use APIs.

We are going to cover projections and functional programming framework in Java in much greater details in subsequent chapters.

6.5. Rich Interfaces

Once we introduced grid projection it is pretty logical to extend definition of a grid node as a grid projection with just that one node it. Similarly, one can extend grid cloud definition as a grid projection that contains all nodes belonging to that specific physical cloud. And finally, it is only logical to provide a conveniently defined global projection that contains all the nodes in the topology.

This is exactly what GridRichNodedoc, and Griddoc interfaces do. They all extend GridProjection interface and add all necessary additional operations that are specific to a grid node, grid cloud or a global topology.

The idea of rich interfaces is central in Scala and Ruby libraries, for example. By having both thin and rich interfaces (GridNode and GridRichNode) we can satisfy both types of interface usages:

  • thin interface that needs to be implemented by the end user (and therefore should be as simple as possible), and

  • rich interface that is actually used by the end user (and therefore should be as rich as possible).

Note
Rich vs. Thin Interfaces
Thin interface that needs to be implemented by the end user and therefore should be as simple as possible. Rich interface that is actually used by the end user and therefore should be as rich as possible.

Historically, Grid interface - being a global all-inclusive projection, has an additional special purpose. Grid interface acts as a main entry point for entire GridGain functionality. In fact, most of the operations you perform on GridGain originate on Grid interface. That is where you can obtain the instance of data grid, get an instance of rich cloud interface for a specific cloud, create and manage grid projections, manually deploy grid tasks and perform multitude of other operations.

To get an instance of Grid interface you need to use GridFactory.

6.6. GridFactory Class

GridFactorydoc class is a life-cycle factory for Grid instances. Its purpose is to provide various ways to start and stop instances of Grid interface. Note that Grid interface - being the main entry point for GridGain APIs - has a strict life-cycle and state machine that is controlled by GridFactory. As noted before, single GridGain runtime (JVM process) can have zero or more Grid instances each providing a local view on a different grid.

The usual way to work with GridGain is to use GridFactory class to start Grid instance using specific (or default) configuration file (usually at the beginning of your application). Starting Grid instance means, among other things, starting a local grid node and have it join the topology. Once you have started Grid instance you can use any APIs provided by GridGain. When GridGain is no longer needed, you use GridFactory to stop the Grid instance and its node will leave the topology (usually at the end of your application).

6.7. GridCache Interface

GridCachedoc interface represent the main API entry point for data grid functionality. GridCache instance always refers to a single named cache. You can configure as many named caches as you like. You receive the GridCache instance from Grid instance (as anything else in GridGain). GridCache is a rich interface and represents global cache projection (see below).

6.8. GridCacheProjection Interface

GridCacheProjectiondoc interface is analogous to GridProjection but it defines a cache projection over specified set of key-value pairs. In fact, GridCache interface extends GridCacheProjection and simply represents a global cache projection, i.e. the projection over all key-value pairs in this cache.

Cache projections are extremely powerful technique in GridGain’s data grid. It provides a monadic set of operations that is defined on any arbitrary set of:

  • keys,

  • values, or

  • key-value pairs

giving data grid a distinct functional flavor and providing consistent API design between compute and data grids.

Important
Compute and In-Memory Data Grid Design Unification
This logical and design unification between compute and data grids around functional monadic concept is one of the unique characteristics of GridGain architecture.

6.9. MapReduce

MapReduce is a relatively new name for very old concept. In a strict terms the term MapReduce refers to patented algorithm introduced by Google in their internal distributed data processing systems and closely mirrored by Hadoop project developed by a competing Yahoo!.

We (as well as distributed programming community in general) tend to use term MapReduce in more wider sense since it was extensively publicized and we refer to any divide-and-concur design strategies or traditional parallel computing as MapReduce. In fact, if you have a long running task, you can split this task into multiple sub-tasks, execute these tasks in parallel, aggregate their results back and get you final result in a fraction of time. That’s a classic parallel programming, or compute grid - and to avoid myriads of names we call it MapReduce too.

Note
Google and MapReduce
It is important, to note, however, that specific implementation that we chose in GridGain has relatively little, if anything, to do with algorithm used by Google (or Hadoop). Not only Google’s algorithm is patented, but it is also very specific to Google’s needs and rarely applicable outside of extreme big-data use cases.

Hadoop provides one implementation of MapReduce that is closely matching Google definition as an Apache project. Note that Google granted the license to Hadoop to use patented Google algorithm.

6.10. Streaming MapReduce

Streaming MapReduce is a less-defined term but often refers to a type of processing similar to MapReduce (i.e. tasks gets split and reduced) but with input data is not finite in general. Typical example is a search in a live video feed: the obvious problem is that you can’t load the entire feed first and then partition it into small parts to be processed in parallel (like you would do in a traditional MapReduce); you need to somehow map and reduce the incoming data as it comes and in the same time keep providing failover, topology resolution, collision resolution, back pressure control, and all other necessary services.

GridGain provide several unique ways of how this type of processing can be implemented.

6.11. Real-Time Processing

GridGain is all about processing large data sets (a.k.a. BigData) in real-time. But what real-time are we talking about?

In GridGain - we are talking about perceptual real-time, or a software real-time (Java Virtual Machine doesn’t technically support hardware real-time processing). Perceptual real-time is defined by a maximum response time that a typical user will wait for the task he or she expects to be "instant" before cancelling the task. For example, when a typical user clicks "Add to Basket" button on the website anything beyond couple of seconds will probably make that user to click "Back" or otherwise cancel the task. For a online trading application the delay of few seconds on submitting the order will indicate something wrong with the system. Portfolio management application that takes 10 seconds to open last minute moving average chart is practically unusable. And so on…

6.12. Closures & Predicates

We mention closures and predicates here only because we provide their full implementation on Java side. Unlike Scala or Groovy, where closures (functions) are part of the language, in Java they are not - and we had to develop an entire state of the art distributed functional framework for Java that is included with GridGain.

Closure is a block of code that encloses its body and any outside variables used inside of it as a function object. You can then pass such function object anywhere you can pass a variable and execute it. Predicate is a special type of closure that simply returns boolean value.

Scala as a hybrid OOP and FP language offers natural advantage over Java since it provides native in-language support for closures which enables much more concise and elegant APIs provided by GridGain’s Scalar DSL. However, with GridGain’s functional framework we brought Java functional usage as close to Scala as possible.

Below is a simple example that broadcasts and prints string on all nodes in the topology. Just compare how close Java, Groovy and Scala implementations are:

GridGain Scalar DSL:
1
2
3
4
5
6
7
8
...
object Test {
    // Broadcast "Howdy!" string to all nodes.
    def main(args: Array[String]) = scalar {
        grid$ *< (BROADCAST, () => println("Howdy!")
    }
}
...
GridGain Grover DSL:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
...
@Typed
@Use(GroverProjectionCategory)
class Test {
    // Broadcast "Howdy!" string to all nodes.
    static void main(String[] args) {
        grover {
            Grid g -> g.run(BROADCAST) { println("Howdy!") }
        }
    }
}
...
GridGain Java APIs:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
...
public class Test {
    // Broadcast "Howdy!" string to all nodes.
    public void main(String[] args) {
        G.in(null, new CIX1() {
            @Override public void applyx(Grid g) throws GridException {
                g.run(BROADCAST, F.println("Howdy!"));
            }
        };
    }
}
...

Closures and predicates used extensively in GridGain APIs and at the core of our design. You will discover plenty of examples later in the book of how closures and predicates - in Java, Groovy and Scala - used throughout the GridGain.

6.13. Type Aliases & Typedefs

One of the unusual features of our Java side APIs is introduction of type aliases (also known as typedefs) in GridGain 3.0. With introduction of functional design in GridGain 3.0 we have quickly discovered that Java APIs are just way too "chatty" due to lack of any type inference by the Java compiler. The only way to combat that problem in Java is to introduce type alias - a subclass with a shorter name.

Obviously, it works best (or at all) for static, factory-type classes but it also works for object instantiations but unfortunately you can’t use type aliases in method signature since they are different types. Such as live in a Java world…

Note
Aliases Applicability
Due to Java-based implementation type aliases or typedefs are used only for static, factory-type classes.

We’ve introduced aliases only for key GridGain types and there’s only about a dozen of aliases in GridGain APIs. It’s pretty easy to memorize the key ones that are used most frequently. Usage of aliases is, of course, optional as you can always use full (original) names of the types. But we are pretty sure that you’ll find aliases on Java side useful to make your code more readable.

Here’s good example demonstrating how aliases can Java code slightly more readable:

Without Typedefs:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
GridFunc.copy(res, goods,
    GridFunc.<Item>and(
        GridFunc.<Item>notNull(),
        GridFunc.<Item>or(
            new GridPredicate<Item>() {
                @Override public boolean apply(Item item) {
                    return item.novelty;
                }
            },
            new GridPredicate<Item>() {
                @Override public boolean apply(Item item) {
                    return item.price < 150;
                }
            }
        )
    )
);

GridFunc.forEach(res, GridFunc.<Item>println());
With Typedefs:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
F.copy(res, goods,
    F.<Item>and(
        F.<Item>notNull(),
        F.<Item>or(
            new P1<Item>() {
                @Override public boolean apply(Item item) {
                    return item.novelty;
                }
            },
            new P1<Item>() {
                @Override public boolean apply(Item item) {
                    return item.price < 150;
                }
            }
        )
    )
);

F.forEach(res, F.<Item>println());

6.14. Grid Task and Job

Grid task and job are the main abstractions in the compute grid. As you recall the compute grid is about parallelization the processing, i.e. splitting a long running task into multiple sub-tasks, executing those sub-tasks in parallel and aggregating their results into one final result.

GridTaskdoc defines the descriptor for the overall task to be processed and grid job defines a sub-tasks, i.e. the piece of code that will travel to remote nodes for execution. Essentially a grid task defines mapping and reducing logic, while GridJobdoc is very similar to java.util.Callable interface by defining a simple executable body.

Note
GridTask and GridJob
GridTask defines overall task descriptor as well as mapping and reducing logic. GridJob defines units of work that tasks get split into and that travel to remote nodes for execution. The result of their execution will be reduced by the task into the final result.

Note that anything that gets executed on the compute grid should be defined as a grid task. Closures and AOP-based executions get converted to grid task automatically when needed.

6.15. Service Provider Interface (SPI)

Service Provider Interface (SPI) is at core of GridGain architecture as it provide pluggable modularization to GridGain. SPI concept is simply a component interface with multiple pluggable implementations that all share unified life cycle management. There are two key benefits in this approach:

  • Rest of GridGain (including all user application code) only uses the SPI interface and is totally independent from its specific implementation.

  • User can provide its own pluggable implementation for SPIs and therefor not limited to the one provided by GridGain itself.

The example of SPI is communication subsystem. It has simple interface GridCommunicationSPIdoc and half a dozen implementations that GridGain provides out of the box. These implementations can be set in grid configuration GridConfiguration (and TCP/IP-based implementation is always set by default so that you don’t have to set anything to get GridGain working right away).

Note
GridGain SPI - Service Provider Implementation
Experienced reader can quickly notice that SPI is very similar to OSGI. Although we looked carefully at OSGI number of years ago we quickly concluded that we needed more custom functionality that OSGI provided at the time. In the same time our SPI architecture and OSGI share the same goals of componentization and modularity.

GridGain is "sliced" into dozen of different SPIs for all major subsystems and each such subsystem can be fully replaced by user’s specific implementation - an extremely powerful feature of GridGain.

7. Functional Programming Framework

Introduction of Functional Programming (FP) into Java-based GridGain has its own unique story in GridGain that is worth repeating here.

It was anything but a straightforward decision… In early 2008 when we released GridGain 2.0 we’ve started looking for a new ways to simplify the usability of GridGain. We’ve had a pretty good story with AOP-based grid enabling and a direct API support for MapReduce type of processing - in fact we’ve been way ahead of everyone else in this department. But we felt there’s still lots of plumbing exposed for many use cases where such exposition was clearly unnecessary.

The obvious thought for us was to look at Domain Specific Language (DSL) route. We quickly realized that Java-based DSL is simply out of question (it would be just another set of Java APIs not much different from we had already). XML-based DSL (or any type of external DSL) was considered a non-started even in a hay days of DSLs of 2006-2007.

So, we started looking at other JVM-based languages that would be much more appropriate for DSL development yet let us reuse GridGain Java-based APIs. During surprisingly short evaluation period (which we’ll chronicle in Scala section later on) we quickly and decisively settled on Scala - relatively new than language that so powerfully combined OOP and FP into one cohesive and expressive language.

As we dived into Scala-based DSL development with a renewed energy we quickly realized that in order to provide truly powerful DSL in Scala (utilizing Scala’s functional core including partial functions, closures, etc.) we essentially had to re-implement most of the main APIs in Scala, i.e. duplicate GridGain in Scala. That was a pretty rude awakening for us as it was simply a no-go to have GridGain implemented in two parallel tracks: one in Java and one in Scala.

And that’s where functional story for Java begins. After some research on our part we noticed that if we could make Java side APIs functional in their design - we could largely reduce the Scala-based DSL to a collection of implicit conversions from functional Java parts to Scala parts (and back). That would also allow to have all implementation in Java (where it is originally was) and keep Scala APIs as a layer on top without any duplication of code what-so-ever.

We’ve set off to develop a first truly distributed Functional Programming framework for Java.

After a few false starts we’ve got a first working prototype (that didn’t suck) and started the refactoring process of our Java APIs into functional mold. What we started noticing along the way is that new APIs (newly added or refactored) were becoming much more powerful and elegant when used with functional constructs such as predicate-based node filters or closure-based executions. In about that time we also came up with grid and cache projections that truly revolutionized the GridGain usability. Many implementations became shorter and yet more expressive when we started using our GridFunc class as well as newly introduced typedefs. All these positive effects on our internal software development solidified our resolve to provide the same capabilities to the users of GridGain.

Note
Scala Leads to FP in Java

All in all, the introduction of Scala support and Scala DSL into GridGain led us to develop one of the most comprehensive Functional Programming frameworks in Java that at the core of most of the functionality in GridGain.

What is even more interesting (and exciting for us) is that FP focus on Java side made the development of Grover, our Groovy++ based DSL for GridGain, simple if not downright trivial. It took just a week to release first beta version of it and it was already mighty useful for large Groovy community.

More on that later…

7.1. Type Aliases and Typedefs

Type aliases or typedefs used in programming languages to give shorter name to existing type name to make the code that is using it more readable and easer to understand.

Many languages provide built-in support for type aliases or typedefs.

C-based languages (C/C++/Objective-C) provide direct support for it:

Objective-C
typedef enum {A, B, C} myEnum;
C/C++
typedef int myInt, yoursInt;

Scala also provides excellent built-in support for type alias that goes beyond C-based capabilities. You can declare a type alias right during importing the original type:

1
import foo.bar.{Original => O} // 'O' becomes alias of 'Original'

And you can also declare the type alias in your code much the same as declaring method or a field (and similar to C/C++):

1
2
type Call[R] = () => R // A shorthand for function
type OneWayTicket = () => Unit // Another shorthand for function

You often use structural types in Scala with type aliases:

type T = { def foo: Unit }

declares T as an alias for any type that has method foo with return type Unit.

Java, unfortunately, does not provide any support for type aliases… Yet Java needs them more than any language above due to its syntactic bloat and lack of any reasonable type inference. In fact, look at this "typical" Java code:

1
2
private HashMap<Collection<String>, Set<Integer>> map =
    new HashMap<Collection<String>, Set<Integer>>();

Since there is no type inference you have to repeat bloated HashMap definition twice in this definition for no reason what-so-ever - it only makes code look more busy and less readable. And if this type of HashMap is used frequently - and there’s no way to shorthand it - you have to repeat this over and over again in every place where it is used.

Note Java needs typedefs more than any language above due to its syntactic bloat and lack of any reasonable type inference.

Surprisingly enough, this shortcoming of Java is often sighted as one of the major reason the Java code "feels" bloated and unnecessary verbose. This gets even more pronounced when you start using Scala that, like Java, is fully statically typed but removes most, if not all, bloat and unnecessary repetition from the code.

With introduction of Functional Programming in GridGain 3.0 (explained in the following chapter) we were faced with this very problem in Java APIs: we had plenty of new interfaces and classes that were used literally everywhere in our code and it was becoming unwieldy in many places since many of them require parametrization and code was becoming simply ridiculous in some place… Needless to say that users of GridGain APIs would have been faced with the same problems.

To solve this problem (somewhat) we have introduced typedefs to our Java APIs (Scala APIs naturally use Scala language type aliases).

Important
Typedef
Essentially, a typedef is simply a subclass with a shorter name - as there is no other way to do that in Java.

In package org.gridgain.grid.typedefdoc we have few dozens of typedefs defined as sub-classes with short one-two letter names for various frequently used types in GridGain. The following table shows all typedefs shipped with GridGain:

Typedef or Type Alias Original Type

C1<E1,R>doc

org.gridgain.grid.lang.GridClosure<E1,R>doc

C2<E1,E2,R>doc

org.gridgain.grid.lang.GridClosure2<E1,E2,R>doc

C3<E1,E2,E3,R>doc

org.gridgain.grid.lang.GridClosure3<E1,E2,E3,R>doc

CAdoc

org.gridgain.grid.lang.GridAbsClosuredoc

CAXdoc

org.gridgain.grid.lang.GridAbsClosureXdoc

CI1<T>doc

org.gridgain.grid.lang.GridInClosure<T>doc

CI2<E1,E2>doc

org.gridgain.grid.lang.GridInClosure2<E1,E2>doc

CI3<E1,E2,E3>doc

org.gridgain.grid.lang.GridInClosure3<E1,E2,E3>doc

CIX1<T>doc

org.gridgain.grid.lang.GridInClosureX<T>doc

CIX2<E1,E2>doc

org.gridgain.grid.lang.GridInClosure2X<E1,E2>doc

CIX3<E1,E2,E3>doc

org.gridgain.grid.lang.GridInClosure3X<E1,E2,E3>doc

CO<T>doc

org.gridgain.grid.lang.GridOutClosure<T>doc

COX<T>doc

org.gridgain.grid.lang.GridOutClosureX<T>doc

CX1<E1,R>doc

org.gridgain.grid.lang.GridClosureX<E1,R>doc

CX2<E1,E2,R>doc

org.gridgain.grid.lang.GridClosure2X<E1,E2,R>doc

CX3<E1,E2,E3,R>doc

org.gridgain.grid.lang.GridClosure3X<E1,E2,E3,R>doc

Fdoc

org.gridgain.grid.lang.GridLangdoc

Gdoc

org.gridgain.grid.GridFactorydoc

P1<E1>doc

org.gridgain.grid.lang.GridPredicate<E1>doc

P2<T1,T2>doc

org.gridgain.grid.lang.GridPredicate2<T1,T2>doc

P3<T1,T2,T3>doc

org.gridgain.grid.lang.GridPredicate3<T1,T2,T3>doc

PAdoc

org.gridgain.grid.lang.GridAbsPredicatedoc

PCE<K,V>doc

org.gridgain.grid.lang.GridPredicate<GridCacheEntry<K, V>>

PEdoc

org.gridgain.grid.lang.GridPredicate<GridEvent>

PKV<K,V>doc

org.gridgain.grid.lang.GridPredicate2<K, V>

PNdoc

org.gridgain.grid.lang.GridPredicate<GridRichNode>

PX1<E1>doc

org.gridgain.grid.lang.GridPredicateX<E1>doc

PX2<T1,T2>doc

org.gridgain.grid.lang.GridPredicate2Xdoc

PX3<T1,T2,T3>doc

org.gridgain.grid.lang.GridPredicate3Xdoc

R1<E1,R>doc

org.gridgain.grid.lang.GridReducerdoc

R2<E1,E2,R>doc

org.gridgain.grid.lang.GridReducer2doc

R3<E1,E2,E3,R>doc

org.gridgain.grid.lang.GridReducer3doc

RX1<E1,R>doc

org.gridgain.grid.lang.GridReducerXdoc

RX2<E1,E2,R>doc

org.gridgain.grid.lang.GridReducer2Xdoc

RX3<E1,E2,E3,Rdoc

org.gridgain.grid.lang.GridReducer3Xdoc

As you can see typedefs defined primarily for functional classes (tuples, closures, and predicates) as well as for a few factory classes like GridFactorydoc and GridFuncdoc. Here’s a short sub-list of the most frequently used typedefs in GridGain:

Typedef or Type Alias Original Type

C1<E1,R>doc

org.gridgain.grid.lang.GridClosure<E1,R>doc

CAdoc

org.gridgain.grid.lang.GridAbsClosuredoc

CO<T>doc

org.gridgain.grid.lang.GridOutClosure<T>doc

Fdoc

org.gridgain.grid.lang.GridLangdoc

P1<E1>doc

org.gridgain.grid.lang.GridPredicate<E1>doc

PAdoc

org.gridgain.grid.lang.GridAbsPredicatedoc

PNdoc

org.gridgain.grid.lang.GridPredicate<GridRichNode>

Here’s a code snipped from GridFunctionalCopyExample example that is shipped with GridGain. First version does not use typedefs and uses full names of the types:

Without Typedefs:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
GridFunc.copy(res, goods,
    GridFunc.<Item>and(
        GridFunc.<Item>notNull(),
        GridFunc.<Item>or(
            new GridPredicate<Item>() {
                @Override public boolean apply(Item item) {
                    return item.novelty;
                }
            },
            new GridPredicate<Item>() {
                @Override public boolean apply(Item item) {
                    return item.price < 150;
                }
            }
        )
    )
);

GridFunc.forEach(res, GridFunc.<Item>println());
With Typedefs:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
F.copy(res, goods,
    F.<Item>and(
        F.<Item>notNull(),
        F.<Item>or(
            new P1<Item>() {
                @Override public boolean apply(Item item) {
                    return item.novelty;
                }
            },
            new P1<Item>() {
                @Override public boolean apply(Item item) {
                    return item.price < 150;
                }
            }
        )
    )
);

F.forEach(res, F.<Item>println());

I would argue that the second version gains more readability and easier to understand since we don’t have to repeat ad nauseum GridFunc and GridPredicate in every line.

Note also that we have couple of typedefs that shorten parameterized types that allows for greater brevity and more concise code:

Typedef or Type Alias Original Type

PCE<K,V>doc

org.gridgain.grid.lang.GridPredicate<GridCacheEntry<K, V>>

PEdoc

org.gridgain.grid.lang.GridPredicate<GridEvent>

PKV<K,V>doc

org.gridgain.grid.lang.GridPredicate2<K, V>

PNdoc

org.gridgain.grid.lang.GridPredicate<GridRichNode>

Warning
Limitation

Now, this approach obviously has limitation:

  • Since typedefs are sub-classes - you can’t use them in signatures unless you expect to be passed a typedef itself - a rather bad approach.

  • Another limitation is that typedef is a new type and a different one from the original type - so any code that relies on exact types (AOP, IoC, etc.) may no longer work. So, essentially, in Java you can only use typedefs during instance creation - similar, to a certain degree, to factory methods.

Note
Scala
You can freely use Java-based typedefs in Scala code - but we suggest to use native type alias support provided by Scala

7.1.1. Typedefs vs. Factory Methods

Now, you may ask why not use factory methods, a standard Java idiom, instead? GridGain actually provides plenty of factory methods in GridFuncdoc class (that itself has F typedef). But factory methods often tend to be more verbose and sometime hide the "creation of new instance" context.

Factory Method
1
Foobar v = FoobarFactory.newFoobar(...);

or

Typedefs
1
Foobar v = new T(...); // 'T' is a typedef for 'Foobar'

In our experience working with GridGain source code we’ve found that typedefs generally provide for the most terse code without loosing context or readability.

7.1.2. Where To Use Typedefs

The answer here is simple - everywhere unless you lose in readability of your code.

Important
Do Not Trade In Readability
We strongly believe that you should never trade in few saved characters for poorer code readability.

Once you get familiar with the some of the most frequently used typedefs - you will start using them freely and in most situation they will improve your code - make it less bloated and concentrate reader’s attention on the actual business logic and away from unnecessary repetitive declarations.

8. GridGain Basics

In this pretty long chapter we’ll cover all basic functionality available in GridGain apart from the "big two" - compute grids and data grids - which will be covered in subsequent individual chapters. Both big subsystems are fundamentally based on the functionality explained in this chapter and therefore the following material is pretty important.

Note What’s more interesting is that fact that some GridGain-based applications don’t even use any of the two main technologies we have - but utilize, for example, actor-based message passing, distributed functional programming, zero deployment or event-based processing provided by GridGain.

8.1. Logging

GridGain provides pluggable logging capability by allowing the user to specify his own logging framework. This is especially convenient when GridGain runs inside of the hosting environment such as servlet container or application server.

In such case, GridGain can be easily configured to route all its logging through host’s logging framework eliminating the need to have multiple log file locations. This also dramatically simplifies the debugging since multiple log file don’t have to be line-by-line synchronized.

To provide this pluggability GridGain relies on its own interface GridLoggerdoc that provides absolute minimal API for logging. While GridGain uses this interface throughout entire product it further provides out-of-the-box complete implementations for this interface using the following popular logging frameworks:

Users, of course, are free to provide their own implementations and many often do for integration with existing log analysis or health monitoring solutions.

8.1.1. Configuration

GridGain logger could be configured either from code by modifying GridConfiguration during start of GridGain or via Spring XML. Following examples demonstrate both ways for Log4J and JCL loggers:

1
2
3
4
5
6
7
GridConfiguration cfg = new GridConfigurationAdapter();
...
// Log4J logger.
URL xml = U.resolveGridGainUrl("modules/tests/config/log4j-test.xml");
GridLogger log = new GridLog4jLogger(xml);
...
cfg.setGridLogger(log);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
...
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.jcl.GridJclLogger">
        <constructor-arg type="org.apache.commons.logging.Log">
            <bean class="org.apache.commons.logging.impl.Log4JLogger">
                <constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
            </bean>
        </constructor-arg>
    </bean>
</property>
...
Configuring Java Logging

Here is an example of configuring Java logger in GridGain configuration Spring file to work over Log4J implementation. Note that we use the same configuration file as we provide by default:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
...
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.java.GridJavaLogger">
        <constructor-arg type="java.util.logging.Logger">
            <bean class="java.util.logging.Logger">
                <constructor-arg type="java.lang.String" value="global"/>
            </bean>
        </constructor-arg>
    </bean>
</property>
...

or

1
2
3
4
5
...
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.java.GridJavaLogger"/>
</property>
...

And the same configuration if you’d like to configure GridGain in your code:

1
2
3
4
5
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJavaLogger(Logger.global);
...
cfg.setGridLogger(log);

or which is actually the same:

1
2
3
4
5
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJavaLogger();
...
cfg.setGridLogger(log);
Configuring Log4j

Here is a typical example of configuring log4j logger in GridGain configuration file:

1
2
3
4
5
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.log4j.GridLog4jLogger">
        <constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
    </bean>
</property>

and from your code:

1
2
3
4
5
6
GridConfiguration cfg = new GridConfigurationAdapter();
...
URL xml = U.resolveGridGainUrl("modules/tests/config/log4j-test.xml");
GridLogger log = new GridLog4jLogger(xml);
...
cfg.setGridLogger(log);
Configuring JBoss logging

Information about configuring JBoss logging with GridGain can be found at http://docs.jboss.org/process-guide/en/html/logging.html.

Configuring Tomcat logging

Please refer to http://tomcat.apache.org/tomcat-6.0-doc/logging.html for more information on how to configure GridGain with Tomcat logging.

Configuring JCL

This logger wraps any JCL - Jakarta Commons Logging loggers. Implementation simply delegates to underlying JCL logger. This logger should be used by loaders that have JCL-based internal logging (e.g., Websphere).

Here is an example of configuring JCL logger in GridGain configuration Spring file to work over Log4J implementation. Note that we use the same configuration file as we provide by default:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
...
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.jcl.GridJclLogger">
        <constructor-arg type="org.apache.commons.logging.Log">
            <bean class="org.apache.commons.logging.impl.Log4JLogger">
                <constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
            </bean>
        </constructor-arg>
    </bean>
</property>
...

If you are using system properties to configure JCL logger use following configuration:

1
2
3
4
5
...
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.jcl.GridJclLogger"/>
</property>
...

And the same configuration if you’d like to configure GridGain in your code:

1
2
3
4
5
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJclLogger(new Log4JLogger("config/default-log4j.xml"));
...
cfg.setGridLogger(log);

or following for the configuration by means of system properties:

1
2
3
4
5
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJclLogger();
...
cfg.setGridLogger(log);
Configuring SLF4J

This logger should be used by hosts that have slf4j-based logging.

Here is an example of configuring SLF4J logger in GridGain configuration Spring file:

1
2
3
<property name="gridLogger">
    <bean class="org.gridgain.grid.logger.slf4j.GridSlf4jLogger"/>
</property>

8.1.2. Injection vs. Instantiation

Instance of GridLogger interface can be obtain at any point via Grid.log() method. However, when logger is needed in grid task and/or jobs it is preferable to use resource injection via @GridLoggerResourcedoc annotation that annotates a field or a setter method for injection of GridLogger instance.

Logger can be injected into instances of following classes:

  • GridTask

  • GridJob

  • GridSpi (and its all implementations)

  • GridLifecycleBean

  • Any object annotated with @GridUserResource annotation

Here is how injection would typically happen:

1
2
3
4
5
6
public class MyGridJob implements GridJob {
     ...
     @GridLoggerResource
     private GridLogger log;
     ...
 }

or

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
public class MyGridJob implements GridJob {
    ...
    private GridLogger log;
    ...
    @GridLoggerResource
    public void setGridLogger(GridLogger log) {
         this.log = log;
    }
    ...
}

8.1.3. Quiet Mode

GridGain 3.0 introduced a quiet logging mode. Essentially, this mode suppresses most of the INFO and all DEBUG logging and provides very concise logging output. This mode is very useful for examples and demonstration as well as for everyday development where full output of INFO or DEBUG is not necessary.

By default starting with version 3.0 GridGain starts in quite mode suppressing INFO and DEBUG log output. If system property GRIDGAIN_QUIET is set to false than GridGain will operate in normal un-suppressed logging mode (with whatever logging back-end is configured). Note that all output in quiet mode is done through standard output (STDOUT).

Note that GridGain’s standard startup scripts $GRIDGAIN_HOME/bin/ggstart.{sh|bat} starts by default in quiet mode. Both scripts accept -v arguments to turn off quiet mode.

8.2. Loaders

Grid loaders are used to start grid in different environments. Loaders provide basic boilerplate code for starting GridGain in various environments. For example, when starting within application servers such as JBoss, Weblogic or Websphere, provided loaders will configure GridGain to use "native" logging, JMX facility, discovery and execution services (JSR-237) which makes GridGain basically blend into hosting environment. Loaders do not need to implement any interface and their sole responsibility is to configure and start grid.

8.2.1. Command Line Loader

Command line loader located in org.gridgain.grid.loaders.cmdline package.

Command line loader is used to start grid from a command line script. Grid comes with ggstart.{sh|bat} startup script located in $GRIDGAIN_HOME/bin folder. By default this script will use configuration defined in $GRIDGAIN_HOME/config/default-spring.xml. This configuration will pick default configuration for all grid internal components and SPI’s which is sufficient for running examples and doing your own development and testing.

If you wish to provide your own configuration file, simply pass its path as parameter to the script.

ggstart.sh C:\myfolder\mygrid.xml

To stop grid, simply press CTRL-C which will initiate GridGain stop routine.

Tip
Script Startup
Note that in addition to starting grid nodes on separate physical machines, GridGain supports starting multiple grid nodes on the same machine as well as in the same VM. The only requirement for default configuration is that IP-Multicast is supported.
Tip
Custom Jars
Starting from 2.1.0 you can add your libraries to the class path without changing startup scripts. Just put them into the $GRIDGAIN_HOME/libs/ext directory and GridGain will pick them up automatically.
Warning
GRIDGAIN_HOME Environment Variable
If you get the following error: Exception in thread "main" java.lang.NoClassDefFoundError: org/gridgain/grid/loaders/cmdline/GridCommandLineLoader, then your $GRIDGAIN_HOME environment variable is not set or is set incorrectly. Please set $GRIDGAIN_HOME environment variable to your GridGain installation folder.

8.2.2. GlassFish Loader

GlassFish loader located in org.gridgain.grid.loaders.glassfish package.

GlassFish loader is used to start GridGain within GlassFish application server. GridGain loader implemented as GlassFish life-cycle listener module. GlassFish loader should be used to provide tight integration between GridGain and GlassFish AS. Current loader implementation works on both GlassFish v1 and GlassFish v2 servers.

The following steps should be taken to configure this loader:

  1. Add GridGain libraries in GlassFish common loader. See GlassFish Class Loaders

  2. Create life-cycle listener module. Use command line or administration GUI:

    asadmin> create-lifecycle-module --user admin --passwordfile ../adminpassword.txt --classname "org.gridgain.grid.loaders.glassfish.GridGlassfishLoader" --property cfgFilePath="config/default-spring.xml" GridGain

For more information consult GlassFish Project - Documentation Home Page.

Note that GlassFish is not shipped with GridGain. If you don’t have GlassFish, you need to download it separately. See https://glassfish.dev.java.net for more information.

8.2.3. Tomcat Loader

Tomcat loader located in org.gridgain.grid.loaders.tomcat package.

Tomcat loader is used to start GridGain within Tomcat server. GridGain loader implemented as Tomcat LifecycleListener. Tomcat loader should be used to provide tight integration between GridGain and Tomcat web container (logging, MBean server).

The following steps should be taken to configure this loader:

  1. Add GridGain libraries in Tomcat common loader. Add in file $TOMCAT_HOME/conf/catalina.properties for property common.loader the following $GRIDGAIN_HOME/gridgain.jar,$GRIDGAIN_HOME/libs/*.jar (replace $GRIDGAIN_HOME with absolute path).

  2. Add GridGain LifeCycle Listener in $TOMCAT_HOME/conf/server.xml.

1
2
3
<!-- GridGain loader -->
<Listener className="org.gridgain.grid.loaders.tomcat.GridTomcatLoader"
          configurationFile="config/default-spring.xml"/>

Note that Tomcat is not shipped with GridGain. If you don’t have Tomcat, you need to download it separately. See http://tomcat.apache.org for more information.

8.2.4. JBoss Loader

JBoss loader located in org.gridgain.grid.loader.jboss package.

JBoss loader is used to start GridGain within JBoss as a JBoss service. Note that jboss-service.xml has a configuration parameter pointing to Spring XML configuration. At startup, JBoss loader will look for the Spring configuration XML file specified in jboss-service.xml.

GridGain ships with pre-built SAR directory. SAR directory located in $GRIDGAIN_HOME/config/jboss folder. You can simply deploy GridGain into JBoss into 2 steps:

  • Change the codebase in jboss-service.xml in META-INF sub-folder to point to correct location.

  • Copy entire SAR directory from $GRIDGAIN_HOME/config/jboss folder to deploy directory of the JBoss.

Here is how $GRIDGAIN_HOME/config/jboss/jboss-service.xml looks (note we use 1.5.0 version as an example):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<!DOCTYPE server PUBLIC "-//JBoss//DTD MBean Service 4.0//EN" "http://www.jboss.org/j2ee/dtd/jboss-service_4_0.dtd">

<!--
    JBoss service descriptor for GridGain JBoss Loader.

    Classpath should contain the following libraries:
    - $GRIDGAIN_HOME/libs/*.jar
    - $GRIDGAIN_HOME/gridgain_1.5.0.jar

    For example, if GridGain is installed into /opt/gridgain-1.5.0 then
    you can use the following classpath settings to includes all
    necessary JARs:

    <classpath codebase="/opt/gridgain-1.5.0/gridgain-1.5.0.jar"/>
    <classpath codebase="/opt/gridgain-1.5.0/libs" archives="*"/>
-->
<server>
    <classpath codebase=".." /> <!-- FIX IT BEFORE USING. -->

    <mbean code="org.gridgain.grid.loaders.jboss.GridJbossLoader" name="gridgain:service=loader">
        <!--
            config/default-spring.xml - Default GridGain configuration.
            config/jboss/ha/jboss-gridgain-ha-spring.xml - JBoss specific configuration that
                will use JBoss SPIs for communication and discovery. Requires JBoss HA enabled.
        -->
        <attribute name="ConfigurationFile">config/default-spring.xml</attribute>
    </mbean>
</server>
Warning Currently provided JBoss loader doesn’t work with JBoss 7. Use servlet context listener loader instead.

8.2.5. WebLogic Loader

Weblogic loader located in org.gridgain.grid.loader.weblogic package.

Weblogic loader is used to start GridGain within Weblogic application server. GridGain loader for WebLogic implemented as a pair of start and shutdown classes. Please consult WebLogic documentation on how to configure startup classes in Weblogic. Weblogic loader is used for tight integration with Weblogic AS. Specifically, Weblogic loader integrates GridGain with Weblogic logging, MBean server, and work manager (JSR-237).

The following steps should be taken to configure startup and shutdown classes:

  1. Add Startup and Shutdown Class in admin console (Environment → Startup & Shutdown Classes → New).

  2. Add the following parameters for startup class:

    • Name: GridWeblogicStartup

    • Classname: org.gridgain.grid.loaders.weblogic.GridWeblogicStartup

    • Arguments: cfgFilePath=config/default-spring.xml

  3. Add the following parameters for shutdown class:

    • Name: GridWeblogicShutdown

    • Classname: org.gridgain.grid.loaders.weblogic.GridWeblogicShutdown

  4. Change classpath for WebLogic server in startup script: CLASSPATH="$CLASSPATH:$GRIDGAIN_HOME/gridgain.jar:$GRIDGAIN_HOME/libs/"

Note that Weblogic is not shipped with GridGain. If you don’t have Weblogic, you need to download it separately. See http://www.bea.com more information.

8.2.6. WebSphere Loader

Websphere loader located in org.gridgain.grid.loaders.websphere package.

Websphere loader is used to start GridGain within Websphere application server. This is GridGain loader implemented as Websphere custom service (MBean). Websphere loader should is used to provide tight integration between GridGain and Websphere AS. Specifically, Websphere loader integrates GridGain with Websphere logging, MBean server and work manager (JSR-237).

The following steps should be taken to configure this loader:

  1. Add CustomService in admin console (Application Servers → server1 → Custom Services → New).

  2. Add custom property for this service: cfgFilePath=config/default-spring.xml.

  3. Add the following parameters:

    • Classname: org.gridgain.grid.loaders.websphere.GridWebsphereLoader

    • Display Name: GridGain

    • Classpath (replace $GRIDGAIN_HOME with absolute path): "$GRIDGAIN_HOME/gridgain.jar:$GRIDGAIN_HOME/libs/". Note that forward slash (/) at the end is critical.

Note that Websphere is not shipped with GridGain. If you don’t have Websphere, you need to download it separately. See http://www.ibm.com/software/websphere/ for more information.

8.2.7. Servlet context listener loader

Servlet context listener loader located in org.gridgain.grid.loaders.servlet package.

This loader can be used to startup GridGain grid inside any web container as servlet context listener. Loader must be defined in web.xml file.

1
2
3
4
5
6
7
8
<context-param>
    <param-name>cfgFilePath</param-name>
    <param-value>config/default-spring.xml</param-value>
</context-param>

<listener>
    <listener-class>org.gridgain.grid.loaders.servlet.GridServletContextListenerLoader</listener-class>
</listener>

Servlet-based loader may be used in any web container like Tomcat, Jetty and etc. Depending on the way this loader is deployed the GridGain instance can be accessed by either all web applications or by only one. See web container class loading architecture:

Tip

To start GridGain in a web container, you have to create WAR file with the following structure:

gridgain.war
    |-- WEB-INF/
        |-- lib/
        |   |-- gridgain.jar
        |   `-- GridGain libraries (contents of $GRIDGAIN_HOME/libs folder)
        `-- web.xml (shipped with GridGain in $GRIDGAIN_HOME/config/servlet folder)

This file should be copied to deployments directory.

8.3. Life Cycle Beans

GridLifecycleBeandoc reacts to grid lifecycle events defined in GridLifecycleEventTypedoc. Use this bean whenever you need to plug some custom logic before or after grid startup and stopping routines.

There are four events you can react to:

GridLifecycleEventType.BEFORE_GRID_START

Invoked before grid startup routine is initiated. Note that grid is not available during this event, therefore if you injected a grid instance via GridInstanceResourcedoc annotation, you cannot use it yet.

GridLifecycleEventType.AFTER_GRID_START

Invoked right after grid has started. At this point, if you injected a grid instance via GridInstanceResourcedoc annotation, you can start using it.

GridLifecycleEventType.BEFORE_GRID_STOP

Invoked right before grid stop routine is initiated. Grid is still available at this stage, so if you injected a grid instance via GridInstanceResourcedoc annotation, you can use it.

GridLifecycleEventType.AFTER_GRID_STOP

Invoked right after grid has stopped. Note that grid is not available during this event.

8.3.1. Resource Injection

Lifecycle beans can be injected using IoC (dependency injection) with grid resources. Both, field and method based injection are supported. The following grid resources can be injected:

  • GridLoggerResourcedoc

  • GridLocalNodeIdResourcedoc

  • GridHomeResourcedoc

  • GridMBeanServerResourcedoc

  • GridExecutorServiceResourcedoc

  • GridMarshallerResourcedoc

  • GridSpringApplicationContextResourcedoc

  • GridSpringResourcedoc

  • GridInstanceResourcedoc

8.3.2. Usage

If you need to tie your application logic into GridGain lifecycle, you can configure lifecycle beans via standard grid configuration, add your application library dependencies into GRIDGAIN_HOME/libs/ext folder, and simply start GRIDGAIN_HOME/ggstart.(sh|bat) scripts.

8.3.3. Configuration

Grid lifecycle beans can be configured programmatically as follows:

1
2
3
4
5
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

cfg.setLifecycleBeans(new FooBarLifecycleBean1(), new FooBarLifecycleBean2());

GridFactory.start(cfg);

or from Spring XML configuration file as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    ...
    <property name="lifecycleBeans">
        <list>
            <bean class="foo.bar.FooBarLifecycleBean1"/>
            <bean class="foo.bar.FooBarLifecycleBean2"/>
        </list>
    </property>
    ...
</bean>

8.4. Metadata & Meta Programming

TODO

8.5. Marshaling

8.6. Messaging

Messaging - an exchange of the messages between grid nodes - is one of the main functional areas that often used standalone in GridGain (without using main Compute and Data Grid capabilities). Given GridGain’s sophisticated topology management and auto-discovery it just makes sense for many applications to simply piggy-back on this functionality and use GridGain as an intelligent message bus.

Note
Intelligent Message Bus

GridGain messaging support provides unique features that makes it an advanced message bus:

  • Zero Deployment support

  • Auto-discovery

  • Actor-based message exchange

  • Pluggable discovery implementation

  • Pluggable communication transport implementation

  • Pluggable security and QoS via communication SPI

8.7. Events

TODO

8.8. Grid-Enabled Executor Service

Grid.executor()doc method creates ExecutorService which will execute all submitted Callable and Runnable tasks on the grid. User may run Callable and Runnable tasks just like normally with java.util.ExecutorService, but these tasks must implement Serializable interface.

The execution will happen either locally or remotely, depending on configuration of Load Balancing SPI and Topology SPI. Distributed ExecutorService delegates commands execution to already started Grid instance. Every submitted task will be serialized and transfered to any node in grid.

Here is an example of an ExecutorService to show how it can be used.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public static void main(String[] args) throws GridException {
    GridFactory.start();

    try {
        Grid grid = GridFactory.grid();

        ExecutorService srvc = grid.executor();

        List<Callable<String>> cmds = new ArrayList<Callable<String>>(2);

        String testVal1 = "test-value-1";
        String testVal2 = "test-value-2";

        cmds.add(new FooCallable<String>(testVal1));
        cmds.add(new FooCallable<String>(testVal2));

        List<Future<String>> futures = srvc.invokeAll(cmds);

        // Wait for command completion.
        String res1 = futures.get(0).get();
        String res2 = futures.get(1).get();

        // Print out results.
        System.out.println("Results [res1=" + res1 + ", res2=" + res2 + ']');
    }
    finally {
        GridFactory.stop(true);
    }
}

where simple FooCallable is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
private static class FooCallable<T> implements Callable<T>, Serializable {
    /** */
    private T data = null;

    /**
     * @param data Some data.
     */
    FooCallable(T data) {
        this.data = data;
    }

    /**
     * {@inheritDoc}
     */
    public T call() throws Exception {
        System.out.println("Message: " + data);

        return data;
    }
}

8.9. Segmenting Grid Nodes

8.9.1. Why Segment Nodes?

Often in deployments you need to segment your grid nodes into several groups, having each group perform one or more subsets of jobs only. For example, let’s say you have a scenario where you have some nodes only submitting jobs to grid (masters), and other groups of nodes only executing these jobs (workers). Then you would segment your grid into 2 groups, masters and workers, and have each group do only what it is supposed to do.

Multiple Sub-Grids

Node segmentation allows you to create multiple sub-grids within your grid. Every sub-grid may have it’s own static physical characteristics and logical responsibilities. All node characteristics, physical or logical, if they are static, can be specified in Spring configuration and used in your Topology SPI or GridTask.map(..)doc logic to implement the segmentation (this is shown in example below).

Note, that based on its attributes, every node can participate in one or multiple segments.

Dynamic Sub-Grids

You may also wish to segment your grid based on dynamic characteristics, not static. For example, what if you only want to include nodes that have less than 50% CPU utilization. In GridGain you can achieve this by using dynamic GridNodeMetricsdoc provided by GridNodedoc. All you would have to do is grab current CPU utilization from node metrics and in your GridTask.map(..)doc method only pick the nodes with CPU’s loaded under 50%.

8.9.2. Node Segmentation Example

This example shows how you can segment your grid into static segments using GridGain. In GridGain such segmentation can be easily achieved with node attributes (see GridNode.getAttribute(String)doc). Let’s say that you want to segment your grid into 3 segments: master, worker1, and worker2.

Every node at startup should get a certain number of attributes assigned to it. Here is how this can be done from Spring XML configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    ...
    <property name="userAttributes">
        <map>
            <!--
                In our example, segment value can be either
                'master', 'worker1', or 'worker2'.
            -->
            <entry key="segment" value="worker1"/>
        </map>
    </property>
    ...
</bean>

Then you can restrict the topology passed into GridTask.map(..)doc method by properly configuring GridNodeFilterTopologySpi to only include nodes from segments worker1 and worker2 and always exclude nodes belonging to master segment. Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="topologySpi">
        <bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
            <property name="filter">
                 <bean class="org.gridgain.grid.lang.GridJexlPredicate2">
                     <constructor-arg index="0">
                         <value>
                             <![CDATA[
                                 node.attributes().get('segment') == 'worker1' ||
                                 node.attributes().get('segment') == 'worker2'
                             ]]>
                         </value>
                     </constructor-arg>
                     <constructor-arg index="1" value="node"/>
                 </bean>
             </property>
        </bean>
    </property>
    ...
</bean>

Alternatively, you can also implement your GridTask.map(..)doc method to map your jobs only to worker node segments. You can check which node segment a node belongs to by checking its attributes via GridNode.getAttribute(String)doc method. Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public class FooBarGridTask extends GridTaskAdapter<String, String> {
    ...
    public Map<GridJob, GridNode> map(List<GridNode> topology, String arg) {
        Map<GridJob, GridNode> jobs = new HashMap<GridJob, GridNode>(topology.size());

        for (GridNode node : topology) {
            String segment = node.attribute("segment");

            if (segment != null) {
                if (segment.equals("worker1"))
                    // This type of job should only execute on 'worker1' segment.
                    jobs.put(new FooBarWorker1Job(arg), node);
                else if (segment.equals("worker2"))
                    // This type of job should only execute on 'worker2' segment.
                    jobs.put(new FooBarWorker2Job(arg), node);
            }
            else
                throw new GridException("Node does not belong to any segment.");
        }

        return jobs;
    }
    ...
}

8.9.3. Grid Node Filters

You are able to filter nodes by providing your implementation of GridPredicate<? super GridRichNode>doc interface. Instances of classes that implement this interface are used to filter grid nodes. These instances are used to filter nodes in method GridProjection.nodes(GridPredicate<? super GridRichNode>…)doc. They are also used by GridNodeFilterTopologySpi to provide task topology based on user-defined node filters.

GridGain also comes with GridJexlPredicatedoc implementation which allows you to conveniently filter nodes based on Apache JEXL expression language. For information about specifics of JEXL expression language refer to Apache JEXL documentation.

Together with GridNodeFilterTopologySpi, GridJexlPredicate2doc allows for a fairly simple way to provide complex SLA-based task topology specifications. For example, expression below shows how the SPI can be configured with GridJexlPredicate2doc to include all Windows XP nodes with more than one processor or core and that are not loaded over 50%.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
GridNodeFilterTopologySpi topSpi = new GridNodeFilterTopologySpi();

GridJexlPredicate2<GridRichNode> filter = new GridJexlPredicate2<GridRichNode>(
    "node.metrics().availableProcessors > 1 && " +
    "node.metrics().averageCpuLoad < 0.5 && " +
    "node.attributes().get('os.name') == 'Windows XP'",
    "node");

// Add filter.
topSpi.setFilter(filter);

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override topology SPI.
cfg.setTopologySpi(topSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="topologySpi">
        <bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
            <property name="filter">
                 <bean class="org.gridgain.grid.lang.GridJexlPredicate2">
                     <constructor-arg index="0">
                         <value>
                             <![CDATA[
                                 node.metrics().availableProcessors > 1 &&
                                 node.metrics().averageCpuLoad < 0.5 &&
                                 node.attributes().get('os.name') == 'Windows XP'
                             ]]>
                         </value>
                     </constructor-arg>
                     <constructor-arg index="1" value="node"/>
                 </bean>
             </property>
        </bean>
    </property>
    ...
</bean>

8.9.4. GridProjection-based Segmentation

GridGain 3.0 introduced another way to segment topology by using GridProjection. Described later in more details GridProjection represents dynamic view on global topology filtered by a predicate. GridProjection is also a monad providing monadic set of operations for any arbitrary set of nodes in the projection.

9. Deployment

Prior to being used, a Grid Task needs to be deployed:

  • If peer class loading is enabled (see property GridConfiguration.isPeerClassLoadingEnabled() in Grid Configuration):

    • Task class loaded from local class path if it is not defined as Local P2P Exclude

    • If there is no task class in local class path or task class needs to be peer loaded it is downloaded from task originating node.

  • If peer class loading is disabled:

    • Check that task class was deployed. If you are using GAR Deployment, then your task will be implicitly deployed every time GAR file or directory is changed. Otherwise, task can be deployed explicitly in code via Grid.deployTask(Class<? extends GridTask<?>>)doc method.

    • If task class was not deployed then we try to find it in local class path by task name. If you are not using @GridTaskNamedoc annotation to provide a custom task name, then your task name will default to the actual class name of the task and the task will be auto-deployed first time it’s executed (no explicit deployment step is required in this case).

    • If task has custom name (that does not correspond task class name), and this task was not deployed before, then exception will be thrown.

9.1. Peer Class Loading

Peer class loading (P2P) is turned on by default. To turn it off set GridConfiguration.isPeerClassLoadingEnabled()doc property to false in Grid Configuration.

Although internals of peer class loading are rather complex, what it means in a nutshell is that when a JVM on the remote node needs to find a certain class as part of the grid task execution, it will check the local class loader first and if such class cannot be found, it will ask the node that originated grid task execution (one that should have this class by design) to provide it. In an essence, GridGain class loading becomes grid-aware.

This technique is invaluable during grid application development. It allows for absolutely grid-transparent development cycle: you write your application as you normally do (in a single node environment of Eclipse, IDEA, etc.), compile and run it - and it seamlessly runs on the grid without any extra deployment or build steps what-so-ever. More over, GridGain supports hot-redeployment so you don’t have to restart GridGain every time you change the grid task code - again, just modify the code, compile and run and all your changes will be picked up on the grid.

Peer class loading sequence works as follows:

  1. GridGain will check if class was loaded at system startup, and if it was, it will be returned. No class loading from a peer node will take place in this case.

  2. If class is not locally loaded, then a request will be sent to task originating node to provide class definition. Originating node will send class byte code definition and the class will be loaded on a peer node.

Peer class loading should be used in most situations, especially during development with Java IDEs. It allows to dramatically reduce overhead of grid-enabled application development effectively making it as quick and productive as local application development. You simply change code and run - and your modified application seamlessly runs on the grid.

Note When utilizing peer class loading, you should be aware of the libraries that get loaded from peer nodes vs. libraries that are already available locally in the class path. Our suggestion is to include all 3rd party libraries into class path of every node. This way you will not transfer megabytes of 3rd party classes to remote nodes every time you change a line of code.
Tip
Error Messages
Some frameworks like Spring or CGLib ask for the certain classes to identify whether another frameworks are available or not. For example Spring being started looks for the Groovy framework and thus when peer-to-peer feature is on this class/resource request might be sent to remote node. If there is no such class/resource available then you may get message "Requested resource not found" on remote (task originating node) and "Failed to get resource due to remote failure" on local node. They are printed out for your information only.

9.1.1. Local P2P Exclude

Note that giving preference to local deployment (as GridGain does by default) does not always work. For example, GridGain utilizes Spring for its own implementation, so Spring is always loaded locally by system class loader at startup. This may create a problem if user also utilizes Spring to load some beans reflectively. For example, Spring Hibernate support will attempt to load Hibernate classes with its own class loader (system class loader in this case) and if Hibernate is not in local class path, class definitions will not be found.

There are 2 ways to solve this problem:

  1. Include Hibernate jars into class path on every node. This will perform better, as Hibernate jars will not have to be loaded with every task deployment.

  2. If above does not work, you can make sure that Spring and Hibernate classes will always be loaded from a peer node by specifying their packages in GridConfiguration.getP2PLocalClassPathExclude()doc configuration property in Grid Configuration.

9.2. Deployment Modes

Deployment mode is specified at grid startup via GridConfiguration.getDeploymentMode()doc configuration property (it can also be specified in Spring XML configuration file). The main difference between all deployment modes is how classes and user resources are loaded on remote nodes via peer-class-loading mechanism. User resources can be instances of caches, databased connections, or any other class specified by user with @GridUserResourcedoc annotation.

The following deployment modes are supported:

Mode Description

GridDeploymentMode.PRIVATEdoc

In this mode deployed classes do not share user resources (see @GridUserResourcedoc).

Basically, user resources are created once per deployed task class and then get reused for all executions. Note that classes deployed within the same class loader on master node, will still share the same class loader remotely on worker nodes. However, tasks deployed from different master nodes will not share the same class loader on worker nodes, which is useful in development when different developers can be working on different versions of the same classes. Also note that resources are associated with task deployment, not task execution. If the same deployed task gets executed multiple times, then it will keep reusing the same user resources every time.

GridDeploymentMode.ISOLATEDdoc

Unlike PRIVATE mode, where different deployed tasks will never use the same instance of user resources, in ISOLATED mode, tasks or classes deployed within the same class loader will share the same instances of user resources (see @GridUserResourcedoc). This means that if multiple tasks classes are loaded by the same class loader on master node, then they will share instances of user resources on worker nodes. In other words, user resources get initialized once per class loader and then get reused for all consequent executions. Note that classes deployed within the same class loader on master node, will still share the same class loader remotely on worker nodes. However, tasks deployed from different master nodes will not share the same class loader on worker nodes, which is especially useful when different developers can be working on different versions of the same classes.

GridDeploymentMode.SHAREDdoc

Same as GridDeploymentMode.ISOLATED, but now tasks from different master nodes with the same user version and same class loader will share the same class loader on remote nodes.

Classes will be undeployed whenever all master nodes leave grid or user version changes. The advantage of this approach is that it allows tasks coming from different master nodes share the same instances of user resources (see @GridUserResourcedoc) on worker nodes. This allows for all tasks executing on remote nodes to reuse, for example, the same instances of connection pools or caches. When using this mode, you can startup multiple stand-alone GridGain worker nodes, define user resources on master nodes and have them initialize once on worker nodes regardless of which master node they came from. This method is specifically useful in production as, in comparison to GridDeploymentMode.ISOLATED deployment mode, which has a scope of single class loader on a single master node, GridDeploymentMode.SHARED mode broadens the deployment scope to all master nodes. Note that classes deployed in GridDeploymentMode.SHARED mode will be undeployed if all master nodes left grid or if user version changed. User version can be specified in META-INF/gridgain.xml file as a Spring bean property with name userVersion. This file has to be in the class path of the class used for task execution. SHARED deployment mode is default mode used by the grid.

GridDeploymentMode.CONTINUOUSdoc

Same as SHARED deployment mode, but user resources (see @GridUserResourcedoc) will not be undeployed even after all master nodes left grid.

Tasks from different master nodes with the same user version and same class loader will share the same class loader on remote worker nodes. Classes will be undeployed whenever user version changes. The advantage of this approach is that it allows tasks coming for different master nodes share the same instances of user resources (see @GridUserResourcedoc) on worker nodes. This allows for all tasks executing on remote nodes to reuse, for example, the same instances of connection pools or caches. When using this mode, you can startup multiple stand-alone GridGain worker nodes, define user resources on master nodes and have them initialize once on worker nodes regardless of which master node they came from. This method is specifically useful in production as, in comparison to ISOLATED deployment mode, which has a scope of single class loader on a single master node, CONTINUOUS mode broadens the deployment scope to all master nodes. Note that classes deployed in CONTINUOUS mode will be undeployed only if user version changes. User version can be specified in META-INF/gridgain.xml file as a Spring bean property with name userVersion. This file has to be in the class path of the class used for task execution.

9.2.1. User Version

User version comes into play whenever you would like to redeploy tasks deployed in SHARED or CONTINUOUS modes. By default, GridGain will automatically detect if class-loader changed or a node is restarted. However, if you would like to change and redeploy code on a subset of nodes, or in case of CONTINUOUS mode to kill the ever living deployment, you should change the user version.

User version is specified in META-INF/gridgain.xml file as follows:

1
2
3
4
<!-- User version. -->
<bean id="userVersion" class="java.lang.String">
    <constructor-arg value="0"/>
</bean>

By default, all gridgain startup scripts (ggstart.sh or ggstart.bat) pick up user version from GRIDGAIN_HOME/config/userversion folder. Usually, it is just enough to update user version under that folder, however, in case of GAR or JAR deployment, you should remember to provide META-INF/gridgain.xml file with desired user version in it.

9.2.2. Always-Local Development

GridGain deployment (regardless of mode) allows you to develop everything as you would locally. You never need to specifically write any kind of code for remote nodes. For example, if you need to use a distributed cache from your GridJobdoc, then you can the following:

  1. Simply startup stand-alone GridGain nodes by executing GRIDGAIN_HOME/ggstart.{sh|bat} scripts.

  2. Inject your cache instance into your jobs via @GridUserResourcedoc annotation. The cache can be initialized and destroyed with @GridUserResourceOnDeployeddoc and @GridUserResourceOnUndeployeddoc annotations.

  3. Now, all jobs executing locally or remotely can have a single instance of cache on every node, and all jobs can access instances stored by any other job without any need for explicit deployment.

9.3. JEE Deployment

When deploying grid tasks into JEE container, you can keep using standard JEE deployment artifacts. For example, if you are deploying a WAR file into JEE container, simply add your grid task classes to the WAR file and that’s it.

9.4. GAR Deployment

GAR deployment is a traditional deployment model, similar to JAR/WAR/EAR deployment in JEE, where you create the *G*rid *AR*chive file that contains all necessary classes for the grid task and deploy it. GridGain comes with URL-based GridDeploymentSpidoc implementation so that you can deploy your GAR files on any URLs accessible via FTP, HTTP(S), POP3 or FILE protocols.

For example, when properly configured, you can just drop your GARs into certain folder on your web server and they will be deployed on the grid.

9.4.1. GAR File

GAR file is a deployable unit used by GridUriDeploymentSpidoc. GAR file is based on ZLIB compression format like simple JAR file and its structure is similar to WAR archive. GAR file has .gar extension.

GAR file structure (file or directory ending with .gar):

META-INF/
        |
        - gridgain.xml
        - ...
lib/
   |
   -some-lib.jar
   - ...
xyz.class
...
  • META-INF entry may contain gridgain.xml file which is a task descriptor XML file. The purpose of task descriptor XML file is to specify all tasks to be deployed. This file is a regular Spring XML definition file. META-INF entry may also contain any other file specified by JAR format.

  • lib entry contains all library dependencies.

  • Compiled Java classes must be placed in the root of a GAR file.

GAR file may be deployed without descriptor file. If there is no descriptor file, GridDeploymentSpidoc will scan all classes in archive and instantiate those that implement GridTaskdoc interface. In that case, all grid task classes must have a public no-argument constructor (you can always use GridTaskAdapterdoc adapter for convenience when creating grid tasks).

Note
gridgain.xml
GAR Descriptor gridgain.xml is optional. If not provided - GridDeploymentSpidoc will scan all classes in GAR archive.

By default, all downloaded GAR files that have digital signature in META-INF folder will be verified and deployed only if signature is valid.

gridgain.xml

gridgain.xml GAR descriptor file is a standard Spring XML that should contain zero or more java.util.List beans. Each list should contain fully qualified class names for grid tasks. Here’s an example of gridgain.xml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
<?xml version="1.0" encoding="UTF-8"?>

<!--
    Spring configuration file for test classes in gar-file.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
        http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.0.xsd">
    <description>Gridgain Spring configuration file in gar-file.</description>

    <!--
        Test tasks specification.
    -->
    <util:list id="tasks">
        <value>foo.bar.SomeGridTask1</value>
        <value>foo.bar.SomeGridTask2</value>
    </util:list>
</beans>

9.4.2. Ant GAR Task

GridGain is shipped with GAR Ant task: GridGarAntTaskdoc. This task extends zip Ant task and can be used exactly like standard jar Ant task. GAR Ant task allows to archive class files and necessary dependencies (like resource files and libraries) with optional descriptor file (gridgain.xml). GridGain comes with an example of using GAr deployment including build.xml. Here’s an example of how to use GAR Ant task in typical build.xml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<!--
    Special task for creating GAR files.
-->
<taskdef name="gar" classname="org.gridgain.grid.tools.ant.gar.GridGarAntTask"
    classpathref="gg.libs.path"/>

<!-- Create GAR file. -->
<gar destfile="${examples.gar.deploy.dir}/${gar.name}"
    descrdir="${examples.gar.dir}/META-INF"
    basedir="${examples.gar.deploy.dir}/tmpgar"/>

9.5. Deployment SPIs

GridGain comes with 2 deployment SPIs:

  • GridLocalDeploymentSpidoc - use it for P2P and JEE deployments.

  • GridUriDeploymentSpidoc - use it for GAR deployment.

10. Network Segmentation

10.1. Overview

Network segmentation (a.k.a. "split-brain" problem) does happen in production and must be accounted for. The cluster may become segmented because of temporary network problems when nodes (or groups of nodes) become isolated from the rest of topology. If not addressed, such segmentation can cause inconsistent clusters and, what’s worse, inconsistent data. GridGain addresses this issue by making sure that no inconsistent write is allowed into the system if segmentation occurred, while writes that are certain to be consistent are still allowed to proceed.

Each node checks network segment individually, using configured segmentation resolvers. Segmentation resolvers are pluggable and can be tailored to any environment. GridGain comes with several implementations out of the box. Segment check is performed in following cases:

  • Before discovery SPI start.

  • When any node leaves topology.

  • When any node in topology fails.

  • Periodically (see GridConfigurationAdapter.setSegmentCheckFrequency(long)doc).

Each segmentation resolver checks segment for validity. Typically, resolver should run a light-weight single check (i.e. one IP address or one shared folder). Compound segment checks may be performed using several resolvers. If segmentation resolver determines that local grid node belongs to incorrect segment, the node will act in accordance to configured segmentation policy. For details on available policies refer to documentation: GridSegmentationPolicydoc.

10.2. Configuration

Here is the list of GridConfigurationdoc properties intended to configure segmentation logic. All configuration parameters below are optional.

Note Segment check is disabled by default.
Setter Method Description Optional Default

setSegmentationPolicy(GridSegmentationPolicy)doc

Segmentation policy.

Yes

STOPdoc

setSegmentationResolvers(GridSegmentationResolver…)doc

Segmentation resolvers.

The following segmentation resolver implementations are built-in:

  • GridReachabilitySegmentationResolverdoc

  • GridSharedFsSegmentationResolverdoc

  • GridTcpSegmentationResolverdoc

Yes

null

setSegmentCheckFrequency(long)doc

Network segment check frequency.

Yes

10000ms

setWaitForSegmentOnStart(boolean)doc

Wait for segment on start flag.

Yes

true

setAllSegmentationResolversPassRequired(boolean)doc

All segmentation resolvers pass required flag.

Yes

true

10.3. Events

GridGain has the following built-in event types to notify on segmentation events:

  • EVT_NODE_SEGMENTEDdoc

  • EVT_NODE_RECONNECTEDdoc

10.4. Segmentation Policies

GridGain has the following built-in segmentation policies.

Policy Description

RESTART_JVMdoc

When segmentation policy is RESTART_JVMdoc, all listeners will receive EVT_NODE_SEGMENTEDdoc event and then JVM will be restarted.

Note, that this will work only if GridGain is started with GridCommandLineLoaderdoc via standard ggstart.{sh|bat} shell script.

STOPdoc

When segmentation policy is STOPdoc, all listeners will receive EVT_NODE_SEGMENTEDdoc event and then particular grid node will be stopped via call to GridFactory.stop(GRID_NAME, true, false)doc.

RECONNECTdoc

When segmentation policy is RECONNECTdoc, all listeners will receive EVT_NODE_SEGMENTEDdoc event and then discovery manager will try to reconnect discovery SPI to topology issuing EVT_NODE_RECONNECTEDdoc event on reconnect.

Note, that this policy is not allowed when distributed data grid is enabled.

This policy can be used only with GridDiscoverySpidoc implementation that has support for reconnect (i.e. annotated with GridDiscoverySpiReconnectSupportdoc annotation).

NOOPdoc

When segmentation policy is NOOPdoc, all listeners will receive EVT_NODE_SEGMENTEDdoc event and it is up to user to implement logic to handle this event.

10.5. Segmentation Resolvers

GridGain has the following built-in segmentation resolvers. You can specify multiple resolvers, in which case all the specified resolvers will be checked. Use setAllSegmentationResolversPassRequired(boolean)doc to make sure that all resolvers must pass segmentation check - otherwise segment is declared valid if it passes one of all segmentation resolver checks.

Resolver Description

GridTcpSegmentationResolverdoc

Segmentation resolver implementation that checks whether node is in the correct segment or not by establishing TCP connection to configured host and port and immediately closing it. This is a multi-purpose resolver as it can be used to check connectivity to any web server (e.g. try to connect to port 80), database (e.g. establish TCP connection to JDBC port), etc…

GridSharedFsSegmentationResolverdoc

Segmentation resolver implementation that checks whether node is in the correct segment or not by writing to and reading from shared directory.

GridReachabilitySegmentationResolverdoc

Segmentation resolver implementation that uses java.net.InetAddress.isReachable(NetworkInterface, int, int) to check whether node is in the correct segment or not.

11. Compute Grid

11.1. MapReduce

11.1.1. Overview

MapReduce is a distributed computing paradigm which allows to map your task into smaller jobs based on some key, execute these jobs on Grid nodes, and reduce multiple job results into one task result.

Here is a diagram that explains how MapReduce works based on Shape Counter example. Given a collection of Shapes we split this collection into 2 parts and send every part to a grid node. Each node will count number of Shapes provided and will return it back to caller. The caller then will add results received from remote nodes and provide the reduced result back to the user (the counts are displayed next to every shape).

http://www.gridgain.com/images/mapreduce_small.png

In GridGain, MapReduce paradigm is implemented via GridTaskdoc interface.

Map Operation

The GridTask.map(..) method splits a task into multiple instances of GridJobdoc and maps every GridJobdoc to a grid node.

Result Operation

Upon completion of any job, GridTask.result(..) method is invoked which is responsible to tell GridGain whether to Wait for more job results, Reduce now, or Failover this job to another node.

Reduce Operation

This operation is responsible for taking multiple results from remote jobs and reducing them into one aggregate result. This aggregated result will be returned to the user.

11.1.2. Pull vs. Push MapReduce

One of the fundamental differences between GridGain’s implementation of MapReduce and the ones in the existing or legacy systems like Sun GridEngine, GigaSpaces, Hadoop and Globus is the cardinality or the type of the mapping operation. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The latter has fundamental advantage that was largely missing in grid computing frameworks before GridGain.

Note GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved.

This unique property of GridGain’s MapReduce implementation has profound effect on ability to develop grid applications with the advanced load balancing, failover and collision resolution logic.

See Early And Late Load Balancing for more information.

11.2. GridTask and GridJob

11.2.1. GridTask And GridJob Interfaces

To create a grid task you need to implement GridTaskdoc interface. When implementing this interface you will also need to be aware of GridJobdoc interface. Basically, both of these interfaces define practically everything you need to know to create a grid task. In a nutshell, GridTask is responsible for splitting business logic into multiple grid jobs, receiving results from individual grid jobs executing on remote nodes, and reducing (aggregating) received jobs' results into final grid task result.

Grid task gets split into jobs when GridTask.map(List, Object)doc method is called. This method returns all jobs for the task mapped to their corresponding grid nodes for execution. Grid will then serialize this jobs and send them to requested nodes for execution.

See GridTaskdoc and GridJobdoc Javadoc documentation for more information about their API.

11.2.2. Executing Grid Tasks

Grid-enabling is a process of making a piece of Java code to execute on the grid. In GridGain, there are two ways to do grid-enabling: API-based and annotation-based.

Tip
Direct Execution vs. Annotation-Based AOP
There is no better or worse between these two methods. They both have their areas of applicability. When creating grid task you basically have the same programming and development model as in JEE: you create a component, deploy it and execute it. With annotation-based grid-enabling you have an extra option of transparently attaching grid-enabling logic to existing code without modifying it (except for additional annotation).
API-Based Grid Task Execution

This method allows to grid-enable any arbitrary Java code. You have a full control on split and aggregate logic and all other aspects of grid task execution. Here is an example of direct grid task execution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
public static void main(String[] args) throws GridException {
    GridFactory.start();

    try {
        Grid grid = GridFactory.getGrid();

        // Execute task.
        GridTaskFuture<String> future = grid.execute(FooBarTask.class, "Argument");

        // Wait for task completion.
        String result = fugure.get();

        // Print out task result.
        System.out.println("Task result: " + result);
    }
    finally {
        GridFactory.stop(true);
    }
}
Annotate Existing Method With Gridify Annotation

The only difference of this method vs. directly executing grid task is that you can annotate a regular Java method and it will become grid-enabled. Using this technique you can still have custom grid task that will handle annotation-based grid-enabling (including split & aggregate logic or passing state to remote jobs) but you will be limited to the boundaries of the method you are grid-enabling. Here is an example of such usage:

1
2
3
4
@Gridify(taskClass = FooBarTask.class, timeout = 3000)
public void sayIt(String arg) {
    // Some business logic.
}

For information on how to configure AOP, refer to AOP Configuration section.

Tip
Serializable State
Note that when using @Gridifydoc annotation on non-static methods without specifying explicit grid task, the state of the whole instance will be serialized and sent out to remote node. Therefore the class must implement java.io.Serializable interface. If you cannot make the class Serializable, then you must implement custom grid task which will take care of proper state initialization. In either case, GridGain must be able to serialize the state passed to remote node.

11.2.3. Configuring Grid Tasks

Starting with GridGain 2.1 you can start multiple instances of Topology SPI, Load Balancing SPI, Failover SPI and Checkpoint SPI. If you do that, you need to tell a task which SPI to use (by default it will use the fist SPI in the list).

Add @GridTaskSpisdoc annotation to your task to specify which SPIs it wants to use. If this annotation is omitted, then by default GridGain will pick the first corresponding SPI implementation from the array provided in configuration.

For more information and examples refer to Specifying Different SPIs Per GridTask documentation.

11.2.4. Grid Task Execution Sequence

The sequence of task execution can be described as following:

  1. Upon request to execute a grid task with given task name system will find deployed task with given name.

  2. System will create new Distributed Grid Task Session. Also see GridTaskSessiondoc.

  3. System will inject all annotated resources (including Distributed Grid Task Session) into grid task instance. See Resources Injection for more information.

  4. System will call method map(…) on GridTaskdoc interface. These method is basically responsible for splitting business logic of grid task into multiple grid jobs (units of execution) and mapping them to grid nodes. Method map(…) returns a map of grid jobs keyed by the grid nodes. Consider using @GridLoadBalancerResourcedoc to inject load balancer into task for assigning jobs to the best available nodes.

  5. System will start sending grid jobs to their respective nodes.

  6. Upon arrival to remote node, grid job gets put on waiting list which is passed to underlying GridCollisionSpidoc SPI.

  7. The Collision SPI on remote node will decide one of the following scheduling policies:

    Policy Description

    WAIT

    Grid Job will be kept on waiting list. In this case, job will not get a chance to execute until next time the Collision SPI is called. Collision SPI gets called every time a new job arrives or an active one completes.

    EXECUTE

    Grid Job will be moved to active list (i.e. activated). In this case system will proceed with job execution.

    REJECT

    Job on the waiting list can be rejected before they get a chance to start executing. In this case the GridJobResultdoc passed into GridTask.result(GridJobResult, List)doc method will contain GridExecutionRejectedExceptiondoc exception. If you are using any of the task adapters shipped with GridGain, then job will be failed over automatically for execution on another node.

    CANCEL

    If GridJob is on the active list and is currently executing, then it can be canceled by calling GridJob.cancel()doc method. Note that in this case job will still complete and return a result from GridJob.execute()doc method.

  8. For activated jobs on remote nodes, system will inject all annotated resources (including Distributed Grid Task Session) into grid job instance. See Resources Injection for more information.

  9. Remote nodes will execute the jobs by calling GridJob.execute()doc method.

  10. If job gets canceled while executing on remote node, then GridJob.cancel()doc method will be called. Note that just like with Thread.interrupt() method, grid job cancellation serves as a hint that a job should stop executing or exhibit some other user defined behavior. Generally it is up to a job to decide whether it wants to react to cancellation or ignore it. Job cancellation can happen for several reasons:

    • Collision SPI has canceled an active job.

    • Parent task has completed without waiting for this job’s result.

    • User canceled task by calling GridTaskFuture.cancel()doc method.

  11. Once job execution is complete, the return value will be sent back to parent task and will be passed into GridTask.result(GridJobResult, List)doc method. If job execution resulted in a checked exception, then GridJobResult.getException()doc method will contain that exception. If job execution threw a runtime exception or error, then it will be wrapped into GridUserUndeclaredExceptiondoc exception. # Method GridTask.result(GridJobResult, List)doc is called for each job result and decides whether or not to continue waiting for the remaining results, failover current result or reduce immediately based on returned policy.

    Policy Description

    GridJobResultPolicy.WAITdoc

    If this policy is returned, then Grid Task will continue to wait for other job results. If this result is the last job result, then GridTask.reduce(List)doc method will be called.

    GridJobResultPolicy.REDUCEdoc

    If this policy is returned, then method GridTask.reduce(List)doc will be called right away without waiting for other jobs' completion (all remaining jobs will receive a cancel request).

    GridJobResultPolicy.FAILOVERdoc

    If this policy is returned, then job will be failed over to another node for execution. The node to which job will get failed over to is decided by GridFailoverSpidoc SPI implementation. Note that if you use any of task adapters then they will automatically fail-over jobs to ther nodes for 2 known failure cases: node crash and job rejection.

  12. When enough results are received, method GridTask.reduce(List)doc is called to aggregate (reduce) these results into one final grid task result.

  13. After reduce(…) is complete - the result is returned to user as grid task result and can be retrieved from GridTaskFuture.get()doc method.

  14. System will clean up all task session resources (such as checkpoints with session scope). Execution of the grid task is considered finished at this point.

11.2.5. Grid Task Coding Guidelines

There are certain known patterns and anti-patterns to be aware of when developing grid task and jobs.

Serialization and Deserialization

Jobs created by task are moved from one grid node to another. Before sending they are serialized into the byte stream and thus need to implement java.io.Serializable interface. On remote node every job is deserialized with a class loader that depends on deployment method (see Grid Deployment).

Prior to GridGain 2.1, every grid job class member (including super classes) except for static members need to implement java.io.Serializable. Static class members will not be sent to remote node and should be initialized on remote node. Note also that task parameters passed into GridJob.execute()doc method are sent to remote nodes and need to implement java.io.Serializable as well.

Starting with GridGain 2.1, you can configure different Grid Marshallers and depending on a marshaller, serialization may either be required or not.

Inner and Anonymous Classes

Any kind of inner classes or anonymous classes are allowed. Write your code as you usually do and GridGain will distribute it. You can implement your job as anonymous class within grid task class and use task class members inside your job. Here is an example of anonymous job:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import java.io.*;
import java.util.*;
import org.gridgain.grid.*;

/**
 * Test task with anonymous job which uses method scope variable.
 */
public class TestGridTask extends GridTaskSplitAdapter<String> {
    /** Dummy multiplier. */
    private int multiplier = 3;

    /**
     * This method is responsible for splitting a task into multiple jobs.
     */
    @Override
    protected Collection<? extends GridJob> split(int gridSize, final String arg) throws GridException {
        List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(gridSize);

        for (int i = 0; i < gridSize; i++) {
            jobs.add(new GridJobAdapter<String>() {
                /**
                 * Every job simply multiplies number of characters in the argument by some multiplier.
                 */
                public Serializable execute() throws GridException {
                    return multiplier * arg.length();
                }
            });
        }

        return jobs;
    }

    /**
     * Reduces multiple job results into one task result.
     */
    public Object reduce(List<GridJobResult> results) throws GridException {
        int sum = 0;

        // For the sake of this example, let's sum all results.
        for (GridJobResult res : results) {
            sum += (Integer)res.getData();
        }

        return sum;
    }
}

Here we have anonymous job class created at line 20 which uses method-scope variable arg of task class declared in method signature at line 16 and used in job at line 25 as well as task class member multiplier declared at line 10 and used at line 25.

Overriding Methods with Gridify Annotation

If you have following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public class A {
    @Gridify
    protected methodA() {
        ...
    }
}

public class B extends A {
    @Override
    protected methodA() {
        ...
        super.methodA();
        ...
    }
}

and use aspects you should get B.methodA() called twice, first on your local node and second time on remote node regardless of class or method modifiers. This is a feature of aspects implementation and we don’t recommend to use @Gridify in parent classes.

Here is step by step explanation:

  1. You create object of class B.

  2. You make a call to B.methodA() and since this method does not have annotation in class B aspects will not work.

  3. Your B.methodA() executes and it calls super.methodA()

  4. A.methodA() has annotation and thus aspect will call GridGain and distribute your object of class B and method call to a grid node.

  5. On the grid node (local or remote) B.methodA() will be called (note that you have object of class B) again.

  6. Your B.methodA() executes and it calls super.methodA()

  7. Method A.methodA() has annotation but GridGain will catch this situation and it won’t be distributed twice but instead will be just called.

As you can see we have 2 executions of B.methodA() and only one A.methodA().

11.2.6. Resources Injection

GridTaskdoc and GridJobdoc implementations can be injected using IoC (dependency injection) with grid resources. Both, field and method based injection are supported.

The following grid resources can be injected:

Resource Description

@GridTaskSessionResourcedoc

@GridInstanceResourcedoc

Injects the actual instance of Griddoc this task is executed on.

@GridLoggerResourcedoc

Injects an instance of GridLoggerdoc logger used by this grid instance.

@GridHomeResourcedoc

Injects a path to GridGain installation home.

@GridExecutorServiceResourcedoc

Injects an instance of java.util.concurrent.ExecutorService used by this grid.

@GridLocalNodeIdResourcedoc

Injects local grid node ID of type java.util.UUID.

@GridMBeanServerResourcedoc

Injects an instance of javax.management.MBeanServer used by this grid node.

@GridJobIdResourcedoc

This resource can only be injected into Grid Jobs and not Grid Tasks. It injects unique job execution ID of type java.util.UUID into an instance of Grid Job.

@GridSpringApplicationContextResourcedoc

This resource injects the Spring application context into tasks and jobs. You can use it for accessing Spring beans or any other information available in Spring application context. By default, this application context is the same as the one used for configuring GridGain, but you can pass a custom one by calling GridFactory.start(GridConfiguration, ApplicationContext)doc method.

Note Note that Spring Application Context is local to every node and is not distributed. Make sure that all bean classes and resources declared in Spring file are available on the node’s classpath.

@GridUserResourcedoc

Use this annotation to inject custom resources into tasks and jobs. The scope of this resource is per-task, so it will be initialized once the task is deployed and de-initialized once task is undeployed. Also see @GridUserResourceOnDeployeddoc and @GridUserResourceOnUndeployeddoc for controlling resource life cycle.

@GridMarshallerResourcedoc

Resource can be injected into the task, job or SPI and gives you simple way of marshalling/unmarshalling data or objects (since 2.1.0).

@GridSpringBeanResourcedoc

Injects any custom resources declared in provided Spring ApplicationContext. It can be injected into grid tasks and grid jobs. The resource will be picked up from provided Spring ApplicationContext by name value. Note, that injected spring bean must be declared in Spring ApplicationContext on every grid node where they get accessed (since 2.1.0).

Refer to Resources Injection for more information.

11.2.7. Convenience Adapters

GridTaskdoc and GridJobdoc come with several convenience adapters to make the usage easier:

Adapter Description

GridTaskAdapterdoc

Grid Task adapter that provides default implementation for GridTask.result(GridJobResult, List)doc method which implements automatic fail-over to another node if remote job has failed due to a node crash (detected by GridTopologyExceptiondoc exception) or due to job execution rejection (detected by GridExecutionRejectedExceptiondoc exception).

GridTaskSplitAdapterdoc

Grid Task adapter that hides the job-to-node mapping logic from user and provides convenient GridTaskSplitAdapter.split(int, Object)doc method for splitting task into sub-jobs in homogeneous environments.

GridJobAdapterdoc

Grid Job adapter that provides default empty implementation for GridJob.cancel()doc method and also allows user to set and get job argument, if there is one.

Refer to corresponding adapter documentation for more information.

11.2.8. Distributed Session Attributes And Checkpoints

Both, Grid Tasks and Grid Jobs can utilize Distributed Grid Task Session for coordination with each other via session attributes and checkpoints.

Session Attributes

Jobs can communicate with parent task and with other job siblings from the same task by setting session attributes (see GridTaskSessiondoc). Other jobs can wait for an attribute to be set either synchronously or asynchronously. Such functionality allows jobs to synchronize their execution with other jobs at any point and can be useful when other jobs within task need to be made aware of certain event or state change that occurred during job execution.

Saving Checkpoints

Long running jobs may wish to save intermediate checkpoints to protect themselves from failures. There are three checkpoint management methods available on Grid Task Session which allow user to save, load, and remove checkpoints.

Jobs that utilize checkpoint functionality should attempt to load a check point at the beginning of execution. If a non-null value is returned, then job can continue from where it failed last time, otherwise it would start from scratch. Throughout it’s execution job should periodically save its intermediate state to avoid starting from scratch in case of a failure.

Refer to Distributed Grid Task Session documentation for more information.

11.2.9. MapReduce Paradigm

The design of GridTaskdoc is heavily influenced by Google MapReduce paradigm. For more information about MapReduce paradigm, refer to MapReduce: Simplified Data Processing on Large Clusters article from Google.

11.2.10. Example

Below is a grid task implementation that is responsible for split and aggregate (a.k.a map/reduce) logic. Note that this implementation uses GridTaskSplitAdapterdoc that simplifies API for grid tasks in homogeneous grids (which is often the case). Main two methods that are implemented here are split and reduce. Method reduce aggregates result (number of characters in the string) returned from every node.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
package org.gridgain.examples.helloworld.api;

import org.gridgain.grid.*;
import java.util.*;
import java.io.*;

public class GridHelloWorldTask extends GridTaskSplitAdapter<String, Integer> {
    /** Auto-injected grid logger. */
    @GridLoggerResource
    private GridLogger log = null;

    @Override
    public Collection<? extends GridJob> split(int gridSize, String phrase) throws GridException {
        // Split the passed in phrase into multiple words separated by spaces.
        String[] words = phrase.split(" ");

        List<GridJob> jobs = new ArrayList<GridJob>(words.length);

        for (String word : words) {
            // Every job gets its own word as an argument.
            jobs.add(new GridJobAdapter<String>(word) {
                /*
                 * Simply prints the word passed into the job and
                 * returns number of letters in that word.
                 */
                public Serializable execute() {
                    String word = getArgument();

                    if (log.isInfoEnabled() == true) {
                        log.info(">>>");
                        log.info(">>> Printing '" + word + "' on this node from grid job.");
                        log.info(">>>");
                    }

                    // Return number of letters in the word.
                    return word.length();
                }
            });
        }

        return jobs;
    }

    /**
     * Sums up all letters from all jobs and returns a
     * total number of letters in the phrase.
     *
     * @param results Job results.
     * @return Number of letters for the phrase passed into
     *      <tt>split(gridSize, phrase)</tt> method above.
     * @throws GridException If reduce failed.
     */
    public Integer reduce(List<GridJobResult> results) throws GridException {
        int totalCharCnt = 0;

        for (GridJobResult res : results) {
            // Every job returned a number of letters
            // for the word it was responsible for.
            Integer charCnt = res.getData();

            totalCharCnt += charCnt;
        }

        // Account for spaces. For simplicity we assume one space between words.
        totalCharCnt += results.size() - 1;

        // Total number of characters in the phrase
        // passed into task execution.
        return totalCharCnt;
    }
}

11.3. GridProjection

TODO

11.4. GridTaskSession

11.4.1. Overview

Distributed task session is created for every task execution. It is defined by GridTaskSessiondoc interface. Task session is distributed across the parent task and all grid jobs spawned by it, so attributes set on a task or on a job can be viewed on other jobs. Correspondingly attributes set on any of the jobs can also be viewed on a task.

Session has 2 main features: attribute and checkpoint (see Checkpoint SPI for more details) management. Both, attributes and checkpoints, can be used from task itself and from the jobs belonging to this task. Session attributes and checkpoints can be set from any task or job methods. Session attribute and checkpoint consistency is fault tolerant and is preserved whenever a job gets failed over to another node for execution. Whenever task execution ends, all checkpoints saved within session with GridTaskSessionScope.SESSION_SCOPEdoc scope will be removed from checkpoint storage. Checkpoints saved with GridTaskSessionScope.GLOBAL_SCOPEdoc will outlive the session and can be viewed by other tasks.

The sequence in which session attributes are set is consistent across the task and all job siblings within it. There will never be a case when one job sees attribute A before attribute B, and another job sees attribute B before A. Attribute order is identical across all session participants. Attribute order is also fault tolerant and is preserved whenever a job gets failed over to another node.

11.4.2. Connected Tasks

Note that apart from setting and getting session attributes, tasks or jobs can choose to wait for a certain attribute to be set using any of the GridTaskSession.waitForAttribute(..) methods. Tasks and jobs can also receive asynchronous notifications about a certain attribute being set through GridTaskSessionAttributeListenerdoc listener. Such feature allows grid jobs and tasks remain connected in order to synchronize their execution with each other and opens a solution for a whole new range of problems.

Imagine for example that you need to compress a very large file (let’s say terabytes in size). To do that in grid environment you would split such file into multiple sections and assign every section to a remote job for execution. Every job would have to scan its section to look for repetition patterns. Once this scan is done by all jobs in parallel, jobs would need to synchronize their results with their siblings so compression would happen consistently across the whole file. This can be achieved by setting repetition patterns discovered by every job into the session. Once all patterns are synchronized, all jobs can proceed with compressing their designated file sections in parallel, taking into account repetition patterns found by all the jobs in the split. Grid task would then reduce (aggregate) all compressed sections into one compressed file. Without session attribute synchronization step this problem would be much harder to solve.

11.4.3. Session Injection

Session can be injected into a task or a job using IoC (dependency injection). See [Resources Injection] page for additional details.

11.4.4. Example

Below is a grid task implementation that is responsible for split and aggregate (a.k.a map/reduce) logic. Note that this implementation uses GridifyTaskSplitAdapterdoc that simplifies API for grid tasks in homogeneous grids (which is often the case). Main two methods that are implemented here are split and reduce. Method reduce aggregates results (sums up all numbers returned by jobs) to calculate length of initial string.

This task will split passed in string into separate word and then pass each word into its own job for execution on different nodes. Every job will do the following:

  1. Execute grid-enabled method with argument passed in.

  2. Add its argument to the session.

  3. Wait for other jobs to add their arguments to the session.

  4. Execute grid-enabled method with all session attributes concatenated into one string as an argument.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
package org.gridgain.examples.helloworld.gridify.session;

import java.io.*;
import java.util.*;
import org.gridgain.grid.*;
import org.gridgain.grid.gridify.*;
import org.gridgain.grid.resources.*;

/**
 * Grid task for {@link GridifyHelloWorldSessionExample} example. It handles spiting
 * this example into multiple jobs for execution on remote nodes.
 * <p>
 * Every job will do the following:
 * <ol>
 * <li>Execute grid-enabled method with argument passed in.</li>
 * <li>Add its argument to the session.</li>
 * <li>Wait for other jobs to add their arguments to the session.</li>
 * <li>Execute grid-enabled method with all session attributes concatenated into one string as an argument.</li>
 * </ol>
 */
public class GridifyHelloWorldSessionTask extends GridifyTaskSplitAdapter<Integer> {
    /** Grid task session will be injected. */
    @GridTaskSessionResource
    private GridTaskSession ses = null;

    /**
     * {@inheritDoc}
     */
    @Override
    protected Collection<? extends GridJob> split(int gridSize, GridifyArgument arg) throws GridException {
        String[] words = ((String)arg.getMethodParameters()[0]).split(" ");

        List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(words.length);

        for (String word : words) {
            jobs.add(new GridJobAdapter<String>(word) {
                /** Job context will be injected. */
                @GridJobContextResource
                private GridJobContext jobCtx = null;

                /**
                 * Executes grid-enabled method once with all
                 * session attributes concatenated into string
                 * as an argument and again with passed in argument.
                 */
                public Serializable execute() throws GridException {
                    String word = getArgument();

                    // Set session attribute with value of this job's word.
                    ses.setAttribute(jobCtx.getJobId(), word);

                    try {
                        // Wait for all other jobs within this task to set their attributes on
                        // the session.
                        for (GridJobSibling sibling : ses.getJobSiblings()) {
                            // Waits for attribute with sibling's job ID as a key.
                            if (ses.waitForAttribute(sibling.getJobId()) == null) {
                                throw new GridException("Failed to get session attribute from job: " +
                                    sibling.getJobId());
                            }
                        }
                    }
                    catch (InterruptedException e) {
                        throw new GridException("Got interrupted while waiting for session attributes.", e);
                    }

                    // Create a string containing all attributes set by all jobs
                    // within this task (in this case an argument from every job).
                    StringBuilder msg = new StringBuilder();

                    // Formatting.
                    msg.append("All session attributes [ ");

                    for (Serializable jobArg : ses.getAttributes().values()) {
                        msg.append(jobArg).append(' ');
                    }

                    // Formatting.
                    msg.append(']');

                    // For the purpose of example, we simply log session attributes.
                    log.info(msg.toString());

                    // Execute gridified method again and return the number
                    // characters in the passed in word.
                    return GridifyHelloWorldSessionExample.sayIt(word);
                }
            });
        }

        return jobs;
    }


    /**
     * Sums up all characters from all jobs and returns a
     * total number of characters in the initial phrase.
     *
     * @param results Job results.
     * @return Number of letters for the word passed into
     *      {@link GridifyHelloWorldSessionExample#sayIt(String)} method.
     * @throws GridException If reduce failed.
     */
    public Integer reduce(List<GridJobResult> results) throws GridException {
        int totalCharCnt = 0;

        for (GridJobResult res : results) {
            // Every job returned a number of letters
            // for the phrase it was responsible for.
            Integer charCnt = res.getData();

            totalCharCnt += charCnt;
        }

        // Account for spaces. For simplicity we assume one space between words.
        totalCharCnt += results.size() - 1;

        // Total number of characters in the phrase
        // passed into task execution.
        return totalCharCnt;
    }
}

11.5. Zero Deployment

Zero Deployment is a unique GridGain feature which automatically monitors deployed resources on the grid and where all necessary JVM classes and resources are loaded on demand. This enables users to simply launch default GridGain nodes, which then immediately become part of the data and compute grid topology without any need for explicit deployment of user’s classes or resources. Their resources are automatically utilized.

Zero Deployment technology seamlessly delivers code updates throughout the grid/cloud topology - eliminating any need for re-building, re-deployment, re-restating, awkward IDE plugins or tool chains. All you do is keep writing and changing your code, and whenever you need to execute it, just hit the Run button in your IDE. Your new code will be automatically deployed on all grid nodes. This feature works with both, compute and data grid. GridGain further provides three different modes of peer-to-peer deployment, supporting the most complex deployment environments like custom class loaders or WAR/EAR files.

Note Distributed class loading and class sharing are supported and allow fine-grained control over how classes and user resources are loaded and shared on remote nodes. You can find details in the section about Deployment Modes.
Note Provisioning on cloud infrastructure, such as Amazon AWS, is dramatically simplified by GridGain’s CloudBoot technology. It minimizes dependencies on cloud provider images by dynamically loading necessary parts of an application during image startup. This eliminates the need to rebuild cloud images every time an application changes.

11.5.1. Example

1
2
3
4
5
6
7
// Create a new object Runnable() and execute it on all grid nodes, local and remote.
G.grid().run(BROADCAST, new Runnable() {
    @Override public void run() {
        // Send the text string to all nodes and print it out on each.
        System.out.println("Hello World from all nodes");
     }
});

The console output looks like this:

[15:17:11] Node JOINED [nodeId8=72d78a0b, addr=[10.1.10.23], order=1340662631022, CPUs=4]
[15:17:11] Topology snapshot [nodes=4, CPUs=4, hash=0xD2ED1710]
Hello World from all nodes
[15:17:16] Node LEFT [nodeId8=72d78a0b, addr=[10.1.10.23], order=1340662631022, CPUs=4]
[15:17:16] Topology snapshot [nodes=3, CPUs=4, hash=0x4CF83855]

Note that the new Runnable() anonymous class has been created and deployed onto the grid automatically. The remote nodes had been started with their default configuration without prior knowledge about the new class - this class has been loaded and deployed on demand automatically at execution time.

11.6. Resource Injection

GridGain allows dependency injection of both internal GridGain resources as well as user resources. It supports field-based and method-based injection. Any resources with the proper annotations will be injected into the corresponding task, job or SPI before it is initialized.

The following internal resources can be injected:

  • Executor Service Resourcedoc

  • GridGain Home Path Resourcedoc

  • Grid Instance Resourcedoc

  • LocalNodeId Resourcedoc

  • Logger Resourcedoc

  • Marshaller Resourcedoc

  • MBean Server Resourcedoc

  • Spring Application Context Resourcedoc

  • Spring Resourcedoc

  • Task Session Resourcedoc

User resources can be injected with these annotations:

  • GridUserResourcedoc

  • GridUserResourceOnDeployeddoc

  • GridUserResourceOnUndeployeddoc

11.6.1. Examples

The complete source code for the examples is located on GitHub

Logger Resource Example

Grid logger is provided to the grid via GridConfigurationdoc. It is used to provide a handle on the configured logger from tasks, jobs, or SPIs. Use @GridLoggerResourcedoc annotation to inject this resource. Here is how a variable injection would typically happen:

1
2
3
4
5
6
public class MyGridJob implements GridJob {
    ...
    @GridLoggerResource
    private GridLogger log = null;
    ...
}

or how a method injection looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
public class MyGridJob implements GridJob {
    ...
    private GridLogger log = null;
    ...
    @GridLoggerResource
    public void setLogger(GridLogger log) {
        this.log = log;
    }
    ...
}
User Resource

@GridUserResourcedoc injects a custom user resource into grid tasks or grid jobs. Use it when you would like to use something like a JDBC connection pool from your tasks or jobs - this way your connection pool will be instantiated only once per task and then reused for all executions of this task.

The resource will be created based on the resourceClass value. If resourceClass is not specified, then the field type or setter parameter type will be used to infer the class type of the resource. Set resourceClass to a specific value if the class of resource cannot be inferred from field or setter declaration (for example, if field is an interface).

The user resource will be instantiated once on every node where a given task is deployed. For resource deployment as well as undeployment callbacks use GridUserResourceOnDeployeddoc and GridUserResourceOnUndeployeddoc annotations. This will typically be used for the initialization of the injectable resource such as opening a database connection, network connection or reading configuration settings.

Tip User resources are never serialized (they get instantiated) and should always be declared as transient. Also, for this reason resources should not be sent to remote nodes.
Tip The scope of user resource depends on Deployment Mode used. You can configure your user resources to be deployed on per-task, per-class-loader, or per-grid basis. Take a look at Deployment Mode documentation for more information.
Tip GridNodeLocal can be used to create a singleton local state per grid node to be reused between various job executions as well.

Use the @GridUserResourcedoc annotation to inject this resource. Here is how a variable injection would typically happen:

1
2
3
4
5
6
public class MyGridJob implements GridJob {
    ...
    @GridUserResource
    private transient MyUserResource rsrc = null;
    ...
}

where the corresponding resource class can look like this:

MyUserResource class
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public class MyUserResource {
    ...
    // Establish a connection to be shared.
    @GridUserResourceOnDeployed private void deploy() {
        connection conn = connPool.getConnection();
    }
    ...
    // Close the connection when the task is finished.
    @GridUserResourceOnUndeployed private void undeploy() {
        connPool.closeConnection(conn);
    }
    ...
}

11.7. GridNodeLocal

When working in distributed environment often you need to have a consistent local state per grid node that is reused between various job executions. For example, what if multiple jobs require database connection pool for their execution - how do they get this connection pool to be initialized once and then reused by all jobs running on the same grid node? Essentially you can think about it as a per-grid-node singleton service, but the idea is not limited to services only, it can be just a regular Java bean that holds some state to be shared by all jobs running on the same grid node.

Before GridGain 3.0 this approach was handled by using @GridUserResourcedoc annotation to annotate fields within GridTaskdoc or GridJobdoc classes to specify singleton beans. However, this approach was dependent on GridDeploymentModedoc configuration and, for ISOLATED or PRIVATE deployment modes, resource could be initialized multiple times, once per GridTask. This forced users to use various hacks in their logic and generally was not very convenient to use.

Starting with GridGain 3.0 GridNodeLocaldoc per-grid-node local cache was introduced. The name was borrowed from ThreadLocal class in Java, because just like ThreadLocal provides unique space per-thread in Java, GridNodeLocal provides unique space per-grid-node in GridGain. GridNodeLocal implements java.util.concurrent.ConcurrentMap interface and is absolutely lock-free. In fact, it simply extends java.util.concurrent.ConcurrentHashMap implementation and, therefore, inherits all the methods available there.

Here is an example of how GridNodeLocal could be used to create some user specific singleton connection pool from a simple GridGain job:

GridNodeLocal Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
final Grid grid = G.start(..);
...
// Execute runnable job on some remote grid node.
grid.run(GridClosureCallMode.BALANCE, new Runnable() {
    public void run() {
        GridNodeLocal<String, MySingletonConnectionPool> nodeLocal = grid.nodeLocal();

        // 1. First see if someone already stored connection pool in node-local storage.
        MySingletonConnectionPool pool = nodeLocal.get("connPool");

        if (pool == null) {
            // 2. Create new connection pool and store it in node-local storage.
            MySingletonConnectionPool old = pool.putIfAbsent("connPool", pool = new MySingletonConnectionPool(..));

            if (old != null)
                pool = old;
        }

        // Perform operations with connection pool.
        ...
    }
});

11.8. Failover

In case of a node crash jobs are automatically failed over to another node. However, in GridGain you can also treat any result that comes back from a remote job execution as a failure. The remote node can still be alive, but it may be running low on CPU, I/O, disk space, etc… – there are many conditions that may result in a failure within your application and that you can use to trigger a failover. GridGain allows you to optionally failover a job based on any job result. Moreover, you have the ability to chose to which node a job should be failed over to as it could be different for different applications or different computations within the same application.

The Failover SPI is responsible for handling the failed execution of a grid job. In all cases Failover SPI takes the failed job and a list of all grid nodes to select another node on which the job execution will be retried. Failover SPI ensures that the job is not re-mapped to the same node it had failed on. Failover is triggered when the method GridTask.result(GridJobResult, List)doc returns the GridJobResultPolicy.FAILOVER policydoc. Then the SPI takes the failed job and list of all grid nodes to produce another node on which the job execution will be retried. GridGain comes with different built-in customizable Failover SPI implementations.

11.8.1. Example

This example illustrates how the Failover SPI can be invoked by returning the Failover policy. See the Failover SPI for examples on how to configure it.

The source code for this example is on GitHub.
1
2
3
4
5
6
7
    ...
     // In case of any exception, the GridJobResultPolicy.FAILOVER policy is returned.
    @Override public GridJobResultPolicy result(GridJobResult result, List<GridJobResult> received)
        throws GridException {
        return result.getException() != null ? GridJobResultPolicy.FAILOVER : GridJobResultPolicy.WAIT;
    }
    ...

11.9. Topology Management

TODO

11.10. Collision Resolution

Custom logic may be used to determine how grid jobs should be scheduled and executed when they arrive on a destination grid node. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation, which can be configured with the Collision SPI.

Every time a new job arrives, it gets placed on waiting queue and it is up to collision SPI to either reject or activate a waiting job, or cancel an active job, or do nothing. Generally, the Collision SPI gets invoked in the following cases:

  • A new job has arrived.

  • An existing job has finished.

  • A node metrics update has been received.

  • The SPI is actively invoked

For more on how to configure collision resolution, refer to the Collision SPI. In the following section there is more on how collision resolution can help with job stealing for late load balancing.

11.11. Load Balancing

11.11.1. Overview

In MapReduce pattern the mapping is a process of splitting the initial task into sub-tasks and assigning them to the grid nodes. Mapping generally involves the splitting logic itself, mapping sub-tasks to the nodes including load balancing, and potential failover and collision resolution. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The later has fundamental advantage that was largely missing in grid computing frameworks before GridGain.

Tip GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved.

11.11.2. Early And Late Load Balancing

The sequence of steps described below shows when Early and Late load balancing policies come into play:

  1. Someone calls one of GridProjection.execute(..) methods passing grid task and its argument to initiate grid task execution in the system.

  2. Method GridTask.map(..) will be called on the task to perform the initial mapping. This method is responsible for taking a task, splitting it into number of sub-tasks and mapping every sub-task with one or more grid nodes. This method returns set of sub-task:node pairs. This is what we call Early Load Balancing as it is done right during initial mapping operation and with only information available at the execution initiation time (see Load Balancing SPI documentation).

  3. Once mapping is done the sub-tasks will travel to respective remote nodes for execution.

  4. When sub-task arrives to the destination grid node it will be subject for collision (scheduling) resolution via Collision SPI. This SPI is called every time when new sub-task arrived, existing sub-task finished its execution or a metrics update is received (with every heartbeat). Collision SPI looks into the queue of its sub-tasks (including a newly received one, if any) and can either cancel sub-task, leave it waiting in the queue, transfer it to another node for execution, or start its execution locally. This is what we call Late Load Balancing. This load balancing happens later in the process of execution and it happens on destination node right where sub-task is about to get executed.

The important characteristic of the late load balancing is that there can be a significant time difference between mapping (early load balancing) and actual time when execution of the sub-task commences on the remote node - and late load balancing allows to account for this non-deterministic aspect of grid execution and potentially re-balance the sub-task on the grid.

For example, our Job Stealing Collision SPI does exactly that. It monitors number of queued sub-tasks on each node and preemptively moves waiting sub-tasks from "busy" node to the "idle" node for execution.

Load balancing capabilities in GridGain are more of the advanced features and not everyone would need them. For example, in homogeneous grid with homogeneous tasks load balancing achieved naturally. However, in many other cases when conditions are more real-life - sophisticated load balancing capabilities are about the only way to get the most out of your grid.

For more information on MapReduce refer to Map/Reduce: Simplified Data Processing on Large Clusters article from Google.

Early Load Balancing

Load balancing is a simple process of the optimal assignment of jobs to the nodes where these jobs to be executed. As almost all kernel level functionality in GridGain the load balancing is designed as SPI (Service Provider Interface). It consists of the public SPI and several implementations. Number of pre-built implementations are shipped with GridGain and user can develop one easily.

Load balancing SPI provides the next best balanced node for job execution. This SPI is used either implicitly or explicitly whenever a job gets mapped to a node during GridTask.map(..) invocation

This load balancing is usually referred as early load balancing as it happens early in the process of the grid task execution during mapping phase of MapReduce process. Note that late load balancing happens during collision resolution and is handled by Collision SPI.

Late Load Balancing

Grid jobs are said to be in collision when a job arrives onto node that already has one or more jobs either waiting or executing on it. Job collision resolution provides means to resolve this collision by basically allowing to:

  • put newly arrived job into the waiting queue

  • schedule it for immediate execution

  • cancel it (and preempt it by failing it over to another node)

  • wake up already waiting job from the queue and schedule it for immediate execution

As almost any kernel level functionality in GridGain collision is designed as SPI (Service Provider Interface). It consists of the public API and several implementations. As always, several pre-built implementations are shipped with GridGain and available for the developer - and custom ones can be easily built.

Collision SPI allows to regulate how grid jobs get executed when they arrive on a destination node for execution. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation: all jobs can proceed in parallel, or jobs can be serialized i.e., only one job can execute in any given point of time, or only certain number or types of grid jobs can proceed in parallel, etc.

Collision SPI doesn’t expose any public APIs and works implicitly behind the scenes. As with any SPI, developer can provide its own implementation and plug it into GridGain.

Collision is generally referred as late load balancing as it happens late in the execution process when job has already arrived onto destination node. In fact, it allows to load balance jobs in the context of the given node. Note that early load balancing handled by Load Balancing SPI and occurs during initial mapping phase of MapReduce process.

11.12. AOP-Based Grid-Enabling

TODO

11.13. Closure Execution

TODO

11.14. Executor Service

TODO

11.15. Cron-Based Scheduling

TODO

11.16. Remote Actors vs. GridGain and Concurrency Unification

This is a bit of off-topic chapter discussing the differences between popular Actors concept (remote actors specifically) and functionality provided by GridGain. I was convinced to write about it after I got questions on Actors vs. GridGain Scalar at almost every conference I spoke about our Scalar DSL.

When talking about Actors there is always a bigger topic of Concurrency Unification trend that aims to combine principles of local multithreading concurrency and distributed programming. We at GridGain are strong supporters of concurrency unification. The example later on in this chapter will show some of our current work in this direction.

Note
actor vs. Actor

We use lowercase actor to denote an instance of actor class or type, and uppercase Actor to denote the concept of actors.

Back to Actors… After my presentation at GeeCON in the spring of 2011 I got the email asking for GridGain version of Pi-calculation example from Akka, a very popular and deservingly so, Actor framework in Scala. The sender was asking for help to compare Akka/Scala actors approach to basic distributed programming and GridGain Scalar’s approach.

I haven’t seen the Akka’s example before so I first downloaded the Akka 1.1 and looked at it…

Now, before we get to it I want to re-iterate few points related to Actor-based concurrency (note that these points are implementation-agnostic and apply equally to Scala actors or Akka actors).

I believe that Actors is an important "new" abstraction for elegantly resolving multithreading concurrency. I’m not, however, subscribing to an idealistic view that they are drop-in replacement for threads and java.util.concurrent utilities. Most of the real-life examples and applications that I’ve seen use all of these mechanisms together with actors.

Note
Actors

I believe that Actors is an important "new" abstraction for elegantly resolving multithreading concurrency.

It is often repeated that Actors do work best when they are used throughout the application - not just for solving one particular synchronization problem - but to build the entire subsystem or a module based on Actors. I tend to agree. Mixing and matching shared-state concurrency with actors produce rather awkward combination and significantly negates any advantages Actors bring.

There are, of course, many use cases where Actors simply don’t work well. Anytime you need a fine grain control, general performance fine-tuning or determinism on threading, or when you need more sophisticated locking algorithms (read/write, counting critical sections, etc.), or shared state is unavoidable - Actors tend to produce more verbose and less flexible solution. For example, I’ve seen several times attempts to implement pseudo-semaphore synchronization on the group of actors - and this was rather ugly.

Now, despite my positive outlook about general Actors for better concurrency management I have more reservations about Remote Actors - i.e. applying the same Actors concept in the distributed context.

11.16.1. Remote Actors and Concurrency Unification

Remote Actors basically allow to exchange messages between actors in different JVMs.

One of the major appeal of remote actors is that they attempt to bridge local JVM multithreading and distributed concurrency. At the first glance it seems rather elegant to extend the share-nothing message passing metaphor into distributed context to provide long sought-after Concurrency Unification.

The illusive advantage of that, however, is ill-fit.

The obvious contention is that in the distributed systems:

  • State is by default not shared by the same JVM

  • Not shared state exposed to parallel access from multiple JVMs

  • Data is already passed as serialized messages

Important
Actors In Distributed Context

The key features of actors in local JVM multithreading are already present in distributed systems.

Yet, distributed systems introduce the host of their own challenges comparatively to local JVM multithreading (JVM-M):

  • much larger latencies

  • cost of message passing is not negligible anymore and can easily exceed the processing time

  • resource starvation and deadlocking due to conditions not present in JVM-M

  • topology management & discovery

  • heterogenous environment (different CPUs, number of cores, memory sizes, OSes, networking, language runtime, etc.)

  • failover is very different conceptually from JVM-M

  • distributed load balancing not present in JVM-M

  • data sharing (a.k.a In-Memory Data Grid) is fundamentally different from sharing data in JVM-M

  • compute sharing (a.k.a. Compute Grid) is fundamentally different from sharing computations in JVM-M

  • deployment and provisioning of the code is dramatically different from JVM-M

It should be pretty obvious that challenges of distributed systems is far wider and more complex in nature than local JVM multithreading and therefore it makes more sense to adopt distributed practices to local JVM multithreading (and not vice verse) to gain true Concurrency Unification. In fact, you need to design from more general to more specific APIs when attempting to unify two related concepts.

11.16.2. Back To Example

So, here is the example of Pi calculation verbatim from the Akka 1.1 tutorial. I took the liberty to remove some excessive comments to make the code shorter:

Pi Calculation using Akka 1.1 Actors
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
package akka.tutorial.first.scala

import akka.actor.{Actor, PoisonPill}
import Actor._
import akka.routing.{Routing, CyclicIterator}
import Routing._
import java.util.concurrent.CountDownLatch

object Pi extends App {
    calculate(nrOfWorkers = 4, nrOfElements = 10000, nrOfMessages = 10000)

    sealed trait PiMessage
    case object Calculate extends PiMessage
    case class Work(start: Int, nrOfElements: Int) extends PiMessage
    case class Result(value: Double) extends PiMessage

    class Worker extends Actor {
        // define the work
        def calculatePiFor(start: Int, nrOfElements: Int): Double = {
            var acc = 0.0
            for (i <- start until (start + nrOfElements))
            acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)
            acc
        }

        def receive = {
            case Work(start, nrOfElements) =>
                self reply Result(calculatePiFor(start, nrOfElements)) // perform the work
        }
    }

    class Master(nrOfWorkers: Int, nrOfMessages: Int, nrOfElements: Int, latch: CountDownLatch)
        extends Actor {
        var pi: Double = _
        var nrOfResults: Int = _
        var start: Long = _

        // create the workers
        val workers = Vector.fill(nrOfWorkers)(actorOf[Worker].start())

        // wrap them with a load-balancing router
        val router = Routing.loadBalancerActor(CyclicIterator(workers)).start()

        // message handler
        def receive = {
            case Calculate =>
                // schedule work
                for (i <- 0 until nrOfMessages) router ! Work(i * nrOfElements, nrOfElements)

                // send a PoisonPill to all workers telling them to shut down themselves
                router ! Broadcast(PoisonPill)

                // send a PoisonPill to the router, telling him to shut himself down
                router ! PoisonPill
            case Result(value) =>
                // handle result from the worker
                pi += value
                nrOfResults += 1
                if (nrOfResults == nrOfMessages) self.stop()
        }

        override def preStart() {
            start = System.currentTimeMillis
        }

        override def postStop() {
            // tell the world that the calculation is complete
            println("\n\tPi estimate: \t\t%s\n\tCalculation time: \t%s millis"
            .format(pi, (System.currentTimeMillis - start)))
            latch.countDown()
        }
    }

    def calculate(nrOfWorkers: Int, nrOfElements: Int, nrOfMessages: Int) {
        // this latch is only plumbing to know when the calculation is completed
        val latch = new CountDownLatch(1)

        // create the master
        val master = actorOf(new Master(nrOfWorkers, nrOfMessages, nrOfElements, latch)).start()

        // start the calculation
        master ! Calculate

        // wait for master to shut down
        latch.await()
    }
}

Now, there are several observations about this example:

  • It work obviously only on a single local JVM.

  • To make it use Remote Actors will require more code, more configuration, more build and more deployment steps.

  • It’s probably not the best example of Actors applicability as you can quite easily re-write it with, for example, parallel collection from Scala 2.9 - but nonetheless that’s one code snippet that’s featured in Akka tutorial as a prime example.

When I looked at this example for the first time - I got really surprised by its complexity because in a nutshell this is simply a multithreaded calculation of the trivial math formula.

It took me about 15 minutes to write this equivalent in GridGain (Scala, Groovy and Java versions below):

Pi Calculation using GridGain 3.0 Scalar DSL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import org.gridgain.scalar._
import scalar._
import org.gridgain.grid.GridClosureCallMode._
import org.gridgain.grid.Grid

object ScalarPiCalculationExample {
    /** Number of calculations per node. */
    private val N = 10000

    // Entry point.
    def main(args: Array[String]) = scalar { g: Grid =>
        println("Pi estimate: " +
           g.@<[Double, Double](SPREAD, for (i <- 0 until g.size()) yield () => calcPi(i * N), _.sum)
    }

    // Basic Pi formula.
    def calcPi(start: Int): Double =
        (start until (start + N)) map (i => 4.0 * (1 - (i % 2) * 2) /  (2 * i + 1)) sum
}
Pi Calculation using GridGain 3.0 Grover DSL
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import org.gridgain.grid.*
import static org.gridgain.grid.GridClosureCallMode.*
import static org.gridgain.grover.Grover.*
import org.gridgain.grover.categories.*

@Typed
@Use(GroverProjectionCategory)
class GroverPiCalculationExample {
    private static int N = 10000

    static void main(String[] args) {
        grover { Grid g ->
            println("Pi estimate: " +
                g.reduce$(SPREAD, (0..<g.size()).collect { { -> calcPi(it * N) } }, { it.sum() } )
            )
        }
    }

    private static double calcPi(int start) {
        (start..<(start + N)).inject(0) { double sum, int i ->
            sum + (4.0 * (1 - (i % 2) * 2) / (2 * i + 1))
        }
    }
}
Pi Calculation using GridGain 3.0 Java APIs
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import static org.gridgain.grid.GridClosureCallMode.*;

/**
 * This example calculates Pi number in parallel on the grid.
 */
public final class GridPiCalculationExample {
    /** Number of calculation per node. */
    private static final int N = 1000;

    // Basic Pi formula.
    private static double calcPi(int start) {
        double acc = 0.0;

        for (int i = start; i < start + N; i++)
            acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1);

        return acc;
    }

    // Entry point.
    public static void main(String[] args) throws GridException {
        G.start();

        try {
            Grid g = G.grid();

            System.out.println("Pi estimate: " +
                g.reduce(SPREAD, F.yield(F.range(0, g.size()), new C1<Integer, Double>() {
                    @Override public Double apply(Integer i) {
                        return calcPi(i * N);
                    }
                }), F.sumDoubleReducer())
            );
        }
        finally {
            G.stop(true);
        }
    }
}

As I see it the GridGain’s version has a distinctive set of advantages:

  • It’s shorter… almost 4 times shorter for Scala version, and it’s much easier to understand

  • It’s distributed by default - it will simply work just as equally on one node (like Akka’s version) as on thousands of nodes without any code or configuration change or any deployment steps

  • It doesn’t require the use of any low-level synchronization utilities like latch in Akka’s version

  • Its implementation is a lot more Scala-friendly - you just use closures as if you do everything locally - and they get automatically deployed and distributed on-demand

Note
GridGain’s Version is Fully Distributed

What is startling is that fully distributed GridGain’s version that includes:

  • auto topology discovery

  • auto load balancing

  • distributed fail over

  • collision resolution

  • zero code deployment & provisioning

  • pluggable marshalling & communication

is almost 4 times shorter and much easier to understand - than a local-only, non-distributed Akka version.

Rich Hickey, the inventor of Clojure programming language, provide another critical view on actors model (albeit in specifically local context). In his own words:

  • It is a much more complex programming model, requiring 2-message conversations for the simplest data reads, and forcing the use of blocking message receives, which introduce the potential for deadlock. Programming for the failure modes of distribution means utilizing timeouts etc. It causes a bifurcation of the program protocols, some of which are represented by functions and others by the values of messages.

  • It doesn’t let you fully leverage the efficiencies of being in the same process. It is quite possible to efficiently directly share a large immutable data structure between threads, but the actor model forces intervening conversations and, potentially, copying. Reads and writes get serialized and block each other, etc.

  • It reduces your flexibility in modeling - this is a world in which everyone sits in a windowless room and communicates only by mail. Programs are decomposed as piles of blocking switch statements. You can only handle messages you anticipated receiving. Coordinating activities involving multiple actors is very difficult. You can’t observe anything without its cooperation/coordination - making ad-hoc reporting or analysis impossible, instead forcing every actor to participate in each protocol.

  • It is often the case that taking something that works well locally and transparently distributing it doesn’t work out - the conversation granularity is too chatty or the message payloads are too large or the failure modes change the optimal work partitioning, i.e. transparent distribution isn’t transparent and the code has to change anyway.

And just in case you really, really need a local-only version of the same code you can easily "pin" the execution to local node simply by replacing global monadic projection with local node. Just replace the line 15 in Scala example:

1
g.@<[Double, Double](SPREAD, for (i <- 0 until g.size()) yield () => calcPi(i * N), _.sum))

with this line:

1
g.localNode.@<[Double, Double](SPREAD, for (i <- 0 until g.size()) yield () => calcPi(i * N), _.sum))

and this will provide the exact functionality of Akka-based example (in case you really need it).

That’s Concurrency Unification at work!

12. In-Memory Data Grid

Data Grid, or In-Memory Data Grid, is a fancy word for distributed data caching. In a nutshell it provides applications with ability to keep data in memory for high availability rather than constantly fetching it from slower storage elsewhere, like RDBMS or shared file systems.

Another way to look at In-Memory Data Grids is to see their complimentary value to Compute Grids. As you recall, Compute Grids are responsible for parallelization of processing (or computations). Once processing is distributed, it is only natural to aim for distribution and partitioning of the data that will be processed by Compute Grid as well - otherwise the non-distributed or centralized data storage (like RDBMS) will quickly become a performance bottle neck in your system.

Note
Compute and In-Memory Data Grid

There is a great deal of synergy between Compute Grids and In-Memory Data Grids. In fact, almost any real-life high performance distributed system will have both in some degree.

On top of high availability, data grids generally allow to scale large amounts of data as well, which is also called data partitioning. When data is partitioned, every key/value pair stored on data grid will be assigned to a designated primary node, and optionally a configurable amount of designated back up nodes (which can optionally be active or inactive). Data should never be lost as long as at least one back up node for it still remains. Such approach allows to use memory available on all nodes within data grid as one whole shared memory, with each node responsible for caching a portion of data allocated to it.

Picture below illustrates these basic points about In-Memory Data Grids:

http://www.gridgain.com/images/data_grid.png

GridGain’s In-Memory Data Grid provides very comprehensive functionality that includes these key features:

  • Local, Replicated, and Partitioned cache modes

  • Collocation of computations and data

  • Extremely rich post functional APIs

  • Zero Provisioning and Deployment for cached data

  • Support for batch reads and writes

  • Synchronous and asynchronous modes (including commits)

  • Replication and invalidation modes

  • Concurrent CAS-like atomic operations

  • Advanced data querying, including support for SQL, TEXT, and FULL SCAN queries

  • Local and remote reducers and transformers for distributed data queries

  • Pluggable persistent storage with support for read-through and write-through semantics

  • Pluggable data affinity for data partitioning and collocation of computations with data

  • Synchronous and Asynchronous data preloading

  • Pluggable Data Overflow or Swap to disk for effective memory management

  • Optimistic and Pessimistic transactions with Read-Committed, Repeatable-Read, and Serializable isolation levels

  • Extremely scalable, feather-weight Eventually-Consistent transactions

  • Pluggable Eviction Policies, including out of the box support for LIRS, LRU, LFU, FIFO, and RANDOM eviction modes

  • Flexible Cache Projections for fine grained control over cache behavior and custom cache views

  • JEE/JTA integration

  • REST-based operations support

  • Write-behind cache (asynchronous cache store updates)

  • Eventually consistent behavior support

  • Full support for Document-style data structures such as JSON

In the following chapter we’ll discuss some of the key concepts in the GridGain’s In-Memory Data Grid.

12.1. Key Concepts

In this chapter we will overview some of the key features of In-Memory Data Grid. However, this documentation is not meant to replace main Javadoc API Documentation, and you should still refer to Javadoc for detailed information on APIs.

Note
Cache vs. Data Grid vs. In-Memory Data Grid

We’ll be using terms cache, data grid and in-memory data grid in both upper case and lower case interchangeable through this chapter and later on.

12.1.1. Collocation of Computations and Data

One of the major scalability problems in utilizing data grids is unnecessary noise traffic which may consume significant amount of bandwidth and often can bring a server to its knees. Imagine a scenario when you are using a partitioned cache and have to constantly retrieve various key-value pairs from cache and perform some computation on them. However, in partitioned mode, every key-value pair may or may not be cached on the local node, so it needs to be fetched from remote nodes. Once the data is fetched and brought to a local node, you perform the computation on it and, once you are done, the data you just requested is most likely discarded.

It may be cached in Near cache on the local mode (which is default GridGain behavior), but Near caches are generally much smaller than partitioned caches (size limitation) and have more aggressive eviction policies than partitioned caches. So to summarize, most of the data access from caches is either immediately discarded or will be discarded shortly after - thus creating unnecessary noise traffic.

It is much more effective to bring the computation exactly to the node where data resides as opposed to bring the data to computation. It is so because in absolute majority of cases computations are much smaller in size to transport over the network and they are changing much less frequently, if at all.

Note
Collocation

Collocation between computations and data is often called Affinity Routing highlighting that computations and data have affinity between them and computation jobs will be routed based on this affinity.

In GridGain such collocation is easily achieved via compute and data grid integration. Here is how this can be achieved using @GridCacheAffinityMappeddoc annotation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Grid g = G.grid();

final GridCache<Integer, String> cache = g.cache(); // Get default cache.

final Integer key = 1;

String result = g.call(BALANCE, new Callable<String>() {
    // Affinity key for the job. The job will travel
    // to the node where the key is cached.
    @GridCacheAffinityMapped
    public Integer affinityKey() {
        return key;
    }

    // The logic below will be executed on remote node which
    // is responsible for caching the specified affinity key.
    @Override public String call() throws Exception {
        // Get locally cached value.
        String val = cache.get(key);

        // Perform some computation on retrieved value.
        ...

        return "OK";
    }
});

12.1.2. Zero Deployment

As we already discussed zero deployment is a GridGain feature which automatically monitors deployed resources on the grid and redeploys it whenever they change. With GridGain you can basically startup several grid nodes, and just leave them running. You will never have to deploy or redeploy anything on them. All you do is keep writing and changing your code, and whenever you need to execute it, just hit the Run button in your IDE and your new code will be automatically deployed on all grid nodes.

This feature works with both, compute and data grid. However, unlike compute grid, data grid keeps auto-deployed objects cached on remote nodes, and hence behaves a little differently. GridGain cache can be deployed in two different GridDeploymentModesdoc - SHARED and CONTINUOUS.

Note
Zero Deployment in Data Grid

Unlike compute grid, data grid keeps auto-deployed objects cached on remote nodes, and hence behaves a little differently.

In SHARED mode, objects will be auto-deployed on remote nodes, but they will be automatically undeployed (hence, removed from cache) whenever either code changes or last node from which a resource has been deployed leaves. The class loader on remote nodes is shared only for the nodes that have common classes, thus if nodes don’t share the same code base, their class loaders remotely will not be shared. This mode is ideal for development, when code changes quite frequently, and after every change it is generally best to start off fresh.

In CONTINUOUS mode objects are also automatically deployed on remote nodes, but they all share the same class-loader remotely and never get automatically undeployed / removed unless specifically specified this way by a user by changing userVersion in META-INF/gridgain.xml file. This mode is ideal for production when it is generally undesirable to undeploy (and hence remove) any object from cache.

12.1.3. Cache Modes

GridGain data grid can be deployed in any of the following 3 modes defined by GridCacheModedoc: LOCAL, REPLICATED, or PARTITIONED

Note
Cache Modes

You can have as many named caches as you like and each can be of LOCAL, REPLICATED, or PARTITIONED type.

LOCAL mode is the most light weight mode of cache operation, as no data is distributed to other cache nodes. It is ideal for scenarios where data is either read-only, or can be periodically refreshed at some expiration frequency. It also works very well with read-through behavior where data is loaded from persistent storage on misses. Other than distribution, local caches still have all the features of distributed cache, such as automatic data eviction, expiration, disk swapping, data querying, transactions, and more.

REPLICATED mode provides the utmost availability as data is available on every grid node. However, in this mode every data update must be propagated to all other nodes which can have an impact on performance and scalability. As the same data is stored on all grid nodes, the size of replicated cache is limited by the amount of memory available on a node. This mode is ideal for scenarios where updates are infrequent and data availability is most important.

PARTITIONED cache is the most scalable distributed cache mode. In this mode the overall data set is divided equally into partitions and all partitions are split equally between participating nodes, essentially creating one huge distributed memory for caching data. This approach allows for storing as much data as can be fit in the total memory available across all nodes, hence allowing for loading gigabytes and terabytes of data into cache memory. Partitioned cache is always fronted by a smaller local cache, also known as Near cache, which stores most recently or most frequently accessed data. Such combination provides for high availability of data that is accessed often together with high scalability of partitioned cache. This mode is ideal for scenarios where data volumes are large and updates are relatively frequent.

12.1.4. Rich Post-Functional APIs

Majority of data grid products provide a simple java.util.concurrent.ConcurrentMap API for working with data grids. However plain ConcurrentMap API is quite limiting and does not often provide the desired convenience or usability. For example, imagine that you need to store objects of different types in cache, say Person and Organization, keyed by an Integer.

Using plain ConcurrentMap<K, V> generics you would have to lose strong typing provided by generics and declare the map as ConcurrentMap<Integer, Object>… ouch! Or take a look at methods like Map.put(…), ConcurrentMap.putIfAbsent(…), or Map.remove(…). If you follow standard Map API then both of these methods have to return a previous value. However, when working with caches, returning previous value is expensive as it may involve a trip to persistent data store or to a neighboring node - why make that extra network trip for cases when previous value is not needed?

To address these issues, and many others, GridCacheProjectiondoc, which is the main caching API in GridGain, has over 200 methods, all of which can basically satisfy every potential use case you can think of. Here is some functionality available on GridCacheProjection API:

  • Various get(…) methods to synchronously or asynchronously get values from cache.

  • Various put(…), putIfAbsent(…), and replace(…) methods to synchronously or asynchronously put single or multiple entries into cache.

  • Various remove(…) methods to synchronously or asynchronously remove single or multiple keys from cache.

  • Various contains(…) method to check if cache contains certain keys or values.

  • Various forEach(…), forAny(…), and reduce(…) methods to visit every cache entry within this projection.

  • Various flagsOn(…), flagsOff(…), and projection(…) methods to set specific flags and filters on a cache projection.

  • Methods like keySet(…), values(…), and entrySet(…) to provide views on cache keys, values, and entries.

  • Various peek(…) methods to peek at values in global or transactional memory, swap storage, or persistent storage.

  • Various reload(…) methods to reload latest values from persistent storage.

  • Various unswap(…) methods to load specified keys from swap storage into global cache memory.

  • Various invalidate(…) methods to set cached values to null.

  • Various lock(…), unlock(…), and isLocked(…) methods to acquire, release, and check on distributed locks on a single or multiple keys in cache.

  • Various clear(…) methods to clear elements from cache, and optionally from swap storage.

  • Various evict(…) methods to evict elements from cache, and optionally store them in underlying swap storage for later access.

  • Various txStart(…)+ and inTx(…) methods to perform various cache operations within a transaction.

  • Various createXxxQuery(…) methods to query cache using either SQL, LUCENE, H2TEXT text search, or SCAN for filter-based full scan.

  • Various mapKeysToNodes(…) methods which provide node affinity mapping for given keys.

  • Various gridProjection(…) methods which provide GridProjectiondoc only for nodes on which given keys reside.

12.1.5. Extended put and remove Operations

All methods that end with x, such as putx(…) or removex(…), provide the same functionality as their sibling methods that don’t end with x, however, instead of returning a previous value, they return a boolean flag indicating whether operation succeeded or not. Returning a previous value may involve a network trip or a persistent store lookup and should be avoided whenever not needed.

12.1.6. Cache Projection

Cache projections, defined by GridCacheProjectiondoc API are responsible for providing the above mentioned rich API for GridGain data grid. However, you can also use projections to create various views on cache data or to enable/disable certain cache features programmatically. For example, here is how you would create cache views that work explicitly on objects of type Person or objects of type Company:

1
2
3
4
5
// Only objects of type Person.
GridCacheProjection<Integer, Person> people = grid.cache().projection(Integer.class, Person.class);

// Only objects of type Company.
GridCacheProjection<Integer, Company> companies = grid.cache().projection(Integer.class, Company.class);

Or here is how you would programmatically enable synchronousCommit mode for the view on object of type Person defined above:

1
GridCacheProjection<Integer, Person> syncCommitPeople = people.flagsOn(GridCacheFlag.SYNC_COMMMIT);

12.1.7. Cache Transactions

Most of the data grid products support transactions. However in many cases they will only provide automatic enlisting into an ongoing JEE/JTA cache transaction which is quite limiting, especially when not running in JEE container. In many cases it is a lot more convenient to use cache transactions directly. GridGain supports both, automatic enlistment into ongoing JEE transaction and explicit cache transactions. Explicit transactions are supported via GridCacheTxdoc.

GridGain supports the following concurrency levels:

  • OPTIMISTIC

  • PESSIMISTIC

  • EVENTUALLY_CONSISTENT

as well as the following isolation levels:

  • READ_COMMITTED

  • REPEATABLE_READ

  • SERIALIZABLE

Such a wide support for concurrency and isolation levels allows to model any kind of concurrent access pattern on any set of data.

Here are examples of how transactions can be used:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
GridCache<String, Integer> cache = G.grid().cache();
...
GridCacheTx tx = cache.txStart();

try {
    // Perform transactional operations.
    Integer v1 = cache.get("k1");
    Integer old1 = cache.put("k2", 2);

    cache.removex("k3");

    // Commit the transaction.
    tx.commit();
}
finally {
    tx.end(); // Rollback, if was not committed.
}

Or, the same logic as above can be executed by passing one or more closures to any of the GridCache.inTx(..) methods as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GridCache<String, Integer> cache = G.grid().cache();
...
cache.inTx(new CI1<GridCacheProjection<String, Integer>>() {
   @Override public void apply(GridCacheProjection<String, Integer> cache) {
      // Perform transactional operations.
      Integer v1 = cache.get("k1");
      Integer old1 = cache.put("k2", 2);

      cache.removex("k3");
   }
}

12.1.8. Cache Queries

Why would you ever query cached data if you can query your persistent store, such as database? Well, the answer is the same as for accessing data by key from cache vs. getting it from database - for performance and scalability. However, querying cache is not exactly the same as querying your database - the main difference is that if cache only has a subset of data stored in database, then you will be only querying that subset, so query result will be reflecting only in-memory state. Does this matter? Depends on your application requirements and also depends on the amount of data you are able to store in cache.

With introduction of cloud computing and virtual instances, the amount of memory available to your grid on the cloud becomes virtually limitless. Adding nodes to your grid has become as simple as calling AWS API on EC2 whenever your application demands it. On top of it, if GridGain swap space is configured, all the data that cannot fit in memory on a single node will be overflown to disk. Also, your application may not even have that much data, or often querying cached data, which usually contains data that has been accessed relatively recently, is good enough. Thus in many cases querying cache is becoming to look more and more like querying your database.

Now that you made a decision in your project that you want to query cached data, the next question becomes how to cache query results. Most of us are familiar with Hibernate and it’s support for 2nd Level Caching which also comes with Query Cache. The way query cache works in Hibernate is generally the way we are used to think of caching queried data.

In a nutshell, a query is issued against the database and the results of the query are then stored in cache in a single collection. If you have multiple queries, then multiple collections containing query results are stored. Now if you ever update a single bean in Hibernate which can potentially affect the query result (pretty much any change to the queried tables), Hibernate is forced to invalidate (remove) the cached query results from cache and reload them on-demand next time. This significantly increases memory consumption, and frequent cache invalidations of query results perform horribly and do not scale at all. Even Hibernate itself discourages its users from using it. Here is the quote from Hibernate documentation:

Hibernate Documentation

Most queries do not benefit from caching or their results. So by default, individual queries are not cached even after enabling query caching …

So, how does querying of cached data help? It helps by entirely removing the need for query result cache altogether. SQL queries on your indexed cached data are executed in memory and perform very fast, so there is no more need to cache query results. Just run your SQL query on your cached data and get the results whenever you need them. However, it is important to note that without rich SQL support for cache queries, they will not be able to replace database queries within your project. In the example below, where Person relates to Company, if your cache does not support SQL joins, then you would not be able to find all people working for the same company, which may be quite limiting.

Note
Querying Cache

Querying cached data removes the need for query result cache altogether.

In GridGain the support for cache queries is virtually without any limitations. If you know SQL, you can run queries against cached data without any limitations, including support for any type of joins, any where clause keywords, order by, group by, etc… In addition to SQL queries, GridGain also supports text queries using Lucene or H2 TEXT underlying indexing. You can also run predicate-based FULL SCAN queries, which will iterate over all cache elements on remote nodes and will include only the ones that passed the predicate filter.

Note
Cache Query Types

GridGain support four types of cache queries:

  • SQL queries with joins, any where clause keywords, order by, group by, etc

  • LUCENE text queries

  • H2 TEXT queries

  • FULL SCAN predicate-based queries

SQL Queries

GridCacheQueryType.SQL query type allows to execute distributed cache queries using standard SQL syntax. All values participating in where clauses or joins must be annotated with GridCacheQuerySqlFielddoc annotation for indexing. There are almost no restrictions as to which SQL syntax can be used. All inner, outer, or full joins are supported, as well as rich set of SQL grammar and functions. GridGain relies on H2 SQL Engine for SQL compilation and indexing. For full set of supported Numeric, String, and Date/Time SQL functions please refer to H2 Functions documentation directly. For full set of supported SQL syntax refer to H2 SQL Select Grammar.

Note that whenever using group by queries, only individual page results will be sorted and not the full result sets. However, if a single node is queried, then the result set will be accurate.

Text Queries

GridGain supports two type of text queries:

  • GridCacheQueryType.LUCENE

  • GridCacheQueryType.H2TEXT

All fields that are expected to show up in text query results must be annotated with GridCacheQueryLuceneFielddoc or GridCacheQueryH2TextFielddoc accordingly. The Lucene based text search utilizes Apache Lucene internally for text indexing, and the H2 TEXT search stores text indexes in special H2 index tables.

Scan Queries

Sometimes when it is known in advance that SQL query will cause a full data scan, or whenever data set is relatively small, the GridCacheQueryType.SCAN query type may be used. With this query type GridGain will iterate over all cache entries, skipping over the entries that don’t pass the optionally provided key or value filters. In this mode the query clause should not be provided.

Execute vs. Visit

If there is no need to return result to the caller node, you can save on a potentially significant network overhead by visiting all query results directly on remote nodes by calling GridCacheQuery.visit(GridPredicate, GridProjection…) method. With this method, all the logic is performed inside of query predicate directly on the queried nodes. If the predicate will return false while visiting, then visiting will finish immediately.

Optional Key and Value Filters

Note that all query results may be additionally filtered by specifying predicates for key and value filtering via GridCacheQuery.remoteKeyFilter(GridOutClosure) and GridCacheQuery.remoteValueFilter(GridOutClosure) methods. These additional filters are useful whenever filtering is based on logic or methods not available in SQL or TEXT queries. For SCAN queries this filters should be usually provided as they are used directly to filter the query results during full scan.

Query Future Iterators

Note that GridCacheQueryFuturedoc implements Iterable interface directly and therefore can be used in regular iterator or foreach loops. The iterator will immediately return all query results that are currently available and will block on page boundaries, whenever the next page is not available yet. Whenever the full result set is needed as a collection, then GridCacheQuery.keepAll(boolean) flag should be set to true and any of the future’s get(…) methods should be called.

Query Example

As an example, suppose we have data model consisting of Employee and Organization classes defined as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class Organization {
    @GridCacheQuerySqlField(unique = true)
    private long id;

    @GridCacheQuerySqlField
    private String name;
    ...
}

public class Person {
    // Unique index.
    @GridCacheQuerySqlField(unique=true)
    private long id;

    @GridCacheQuerySqlField
    private long orgId; // Organization ID.

    // Not indexed.
    private String name;

    // Non-unique index.
    @GridCacheQuerySqlField
    private double salary;

    // Index for text search.
    @GridCacheQueryLuceneField
    private String resume;
    ...
}

Then you can create and execute queries that check various salary ranges like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
GridCache<Long, Person> cache = G.grid().cache();
...
// Create query which selects salaries based on range for all employees
// that work for a certain company.
GridCacheQuery<Long, Person> qry = cache.createQuery(SQL, Person.class,
    "from Person, Organization where Person.orgId = Organization.id " +
        "and Organization.name = ? and Person.salary > ? and Person.salary <= ?");

// Query all nodes to find all cached GridGain employees
// with salaries less than 1000.
qry.queryArguments("GridGain", 0, 1000).execute(grid);

// Query only remote nodes to find all remotely cached GridGain employees
// with salaries greater than 1000 and less than 2000.
qry.queryArguments("GridGain", 1000, 2000).execute(grid.remoteProjection());

// Query local node only to find all locally cached GridGain employees
// with salaries greater than 2000.
qry.queryArguments(2000, Integer.MAX_VALUE).execute(grid.localNode());

Here is a possible query that will use Lucene text search to scan all resumes to check if employees have Master degree:

1
2
3
4
GridCacheQuery<Long, Person> mastersQry = cache.createQuery(LUCENE, Person.class, "Master");

// Query all cache nodes.
mastersQry.execute(grid.localNode()));

12.1.9. Eviction Policies

Selecting proper cache eviction strategy is one of the main parts of cache configuration. Generally, eviction controls the maximum number of elements that can be stored in cache, just so cache does not grow indefinitely. However, every eviction strategy will evict elements in different order and selecting a wrong strategy can have a significant impact on cache hit ratio and performance.

It is also important to note that eviction policy is pluggable in GridGain, and users can plug their own eviction policy whenever none of the ones provided out of the box is adequate. The following eviction policies are available out of the box:

  • GridCacheLruEvictionPolicydoc

  • GridCacheLirsEvictionPolicydoc

  • GridCacheFifoEvictionPolicydoc

  • GridCacheRandomEvictionPolicydoc

  • GridCacheAlwaysEvictionPolicydoc

  • GridCacheNeverEvictionPolicydoc

Note that if GridCacheConfiguration.isSwapEnabled() set to true, then evicted entries will be overflown to a swap storage, which is, by default, a file-based disk storage defined by GridFileSwapSpaceSpi Javadoc.

12.1.10. Preloading

Preloading newly started cache nodes is important whenever it is necessary to have common data set in memory on all nodes (you may not need it if cache can always read-through a missing value from a persistent store). When preloading is enabled (i.e. has value other than GridCachePreloadMode.NONE), distributed caches will attempt to preload all necessary values from other grid nodes. GridGain supports the following preloading modes defined in GridCachePreloadModedoc:

  • GridCachePreloadMode.SYNC mode is a synchronous preload mode. Distributed caches will not start until all necessary data is loaded from other available grid nodes.

  • GridCachePreloadMode.ASYNC mode is asynchronous preload mode (this mode is configured by default). Distributed caches will start immediately and will load all necessary data from other available grid nodes in the background.

  • GridCachePreloadMode.NONE mode is there to disable preloading. In this mode no preloading will take place which means that caches will be either loaded on demand from persistent store whenever data is accessed, or will be populated explicitly.

Note that REPLICATED caches will try to load the full set of cache entries from other nodes (or as defined by pluggable GridCacheAffinitydoc, while PARTITIONED caches will only load the entries for which current node is primary or backup.

Also note that preload mode does not makes sense for LOCAL caches as they are local by definition and, therefore, cannot preload any values from neighboring nodes.

12.1.11. Cache Store

Persistent storage in GridGain is defined by GridCacheStoredoc API. Providing proper cache store implementation is important whenever read-through or write-through behavior is desired. Read-through means that data will be read from persistent store whenever it’s not available in cache, and write-through means that data will be automatically persisted whenever it is updated in cache.

Note that there is also refresh-ahead mode specified by GridCacheConfiguration.getRefreshAheadRatio() configuration parameter. If value is other than zero, then entry will be preloaded in the background whenever it is accessed and refresh ratio of it’s total time-to-live has passed. This feature ensures that entries are always automatically re-cached whenever they are nearing expiration.

Note
Example
For example, if refresh ratio is set to 0.75 and entry’s time-to-live is 1 minute, then if this entry is accessed any time after 45 seconds since last update (which is 0.75 of a minute), the cached value will be immediately returned, but entry will be asynchronously reloaded from persistent store in the background.

Example implementations of cache stores (one is backed by JDBC and another one by Hibernate) can be found under GRIDGAIN_HOME/examples/java/org/gridgain/examples/cache/store folder.

12.1.12. Write-Behind Cache

In a simple write-through mode each cache put and remove operation will involve a corresponding request to the storage and therefore the overall duration of the cache update might be relatively long. Additionally, an intensive cache update rate can cause an extremely high storage load.

For such cases GridGain offers an option to perform asynchronous storage update also known as write-behind. The key concept of this approach is to add a persist request to the queue and postpone data persistence to a certain point in future. The actual data persistence can be triggered by time-based events (the maximum time that data entry can reside in the queue is limited), by queue-size events (the queue is flushed when it’s size reaches some particular point), or by using both of them in combination in which case either event will trigger the flush.

What benefits does write-behind cache store provide? In addition to obvious performance benefits, because cache writes simply become faster, this approach scales a lot better as long as your application can tolerate delayed persistence updates. When number of nodes in data grid grows and every node performs frequent updates, it is very easy to overload the underlying system of records, like database. Write-behind approach allows to maintain high throughput of writes in the system without bottlenecking at the persistence layer. Moreover, cache can continue operating even if your database crashes or goes down. In this case the persistence queue will keep storing all the updates until the database comes back up.

Note
Example
With write-behind approach only the last update to an entry will be written to the underlying storage. If cache entry with key key1 is sequentially updated with values value1, value2 and value3 respectively, then only single store request for (key1, value3) pair will be propagated to the persistent storage.
Note
Example
Batch store operations are usually more efficient than a sequence of single store operations, so one can exploit this feature by enabling batch operations in write-behind mode. Update sequences of similar types (put or remove) can be grouped to a single batch. For example, sequential cache puts of (key1, value1), (key2, value2), (key3, value3) will be batched into a single storeAll operation.

Note that GridCacheStore implementation should take into account possible side effects of write-behind if you want to use this feature. For cases described in the first example only last data update is written to the database. In several cases cache updates may be reordered. Both cases may cause referential constraints violation in the persistent storage, so the GridCacheStore implementation should either have these constraints disabled or provide some way to resolve possible conflicts.

Write-behind can be enabled and configured with GridCacheConfigurationAdapterdoc.

12.2. Distributed Data Structures

Did you ever wish you could take a data structure you are familiar with and distribute it over grid? For example, why not take java.util.concurrent.BlockingDeque and add something to it on one node and poll it from another node? Or why not have a distributed primary key generator which would guarantee uniqueness on all nodes? Or how about a distributed java.util.concurrent.atomic.AtomicLong which can be updated and read from any node on the grid? GridGain gives you such capability. What GridGain did is actually take most of the data structures from java.util.concurrent framework and made sure they could be used in distributed fashion.

Currently you can find the following distributed data structures in GridGain:

  • Distributed blocking and non-blocking queues with FIFO, LRU, or Priority policies

  • Distributed atomic sequences (or primary key generators)

  • Distributed AtomicLong

  • Distributed CountDownLatch

12.2.1. Distributed Queues

Distributed queues are realized by GridCacheQueuedoc API. They are created directly from GridCachedoc API and support different modes of operation based on your application requirements. Cache queues implement all methods from java.util.Collection API and support adding and removing elements from either side, head or tail. Additionally you can get the elements from any position within the queue without having to iterate through the queue - as a matter of fact, most of cache queue methods have O(1) complexity.

Here is an example of how a simple unbounded collocated FIFO queue can be created in GridGain.

1
GridCacheQueue<String> fifoUnboundedCollocatedQueue = grid.cache().queue("myqueue");
Collocated vs. Non-Collocated Queues

If you plan to create just a few queues containing lots of data, then you would create a non-collocated queue. This will make sure that about equal portion of each queue will be stored on each grid node. On the other hand, if you plan to have many queues relatively small in size (compared to the whole cache), then you would most likely create collocated queues. In this mode all queue elements will be stored on the same grid node, but about equal amount of queues will be assigned to every node.

Both, collocated and non-collocated modes have their advantages and disadvantages. As you probably already guessed, all elements form a collocated queue are stored on the same grid node, hence the name, so you are bounded by whatever can fit in the memory of a single node (or if you use swap storage, queue elements will be overflown to disk if memory runs out). Collocated queues are usually used with bounded mode as well to make sure there is upper limit on the size, however, it is not a requirement. Iteration over collocated queues is extremely fast as all of it happens on the same node. Also, getting elements at certain positions of the queue or finding a position of a certain element is very fast as well, as all these operations happen locally on the node responsible for caching the queue. Generally all operations on collocated queues have O(1) complexity.

Non-collocated queues on the other hand are distributed across all participating grid nodes and essentially have no memory limitations (or limited to the overall available memory on the whole grid). They can effectively store a lot more data than collocated queues, but at expense of certain operations becoming slower or unsupported altogether. For example iteration over non-collocated queues is a distributed operation and requires querying every participating cache node. Methods like GridCacheQueue.position(T item) or GridCacheQueue.items(Integer… positions) are not supported in non-collocated mode due to poor performance and excessive distribution these operations would require (all other methods on the GridCacheQueue API are supported).

Here is an example of how an unbounded collocated LIFO quueue could be created:

1
2
3
4
5
6
GridCacheQueue<String> lifoUnboundedCollocatedQueue = grid.cache().queue(
    "myqueue",               // Queue name.
    GridCacheQueueType.LIFO, // Queue type.
    0,                       // Maximum capacity, 0 for unlimited.
    true                     // Collocation flag.
);
Bounded Queues

Bounded queues allow to specify maximum size for a queue. Bounded queues can be either collocated or non-collocated. Cache queues have two sets of methods: blocking and non-blocking. If a blocking method is used, like put(T item) method, and bounded queue reaches its maximum size, then all attempts to put additional elements to it will block until an element is taken from the queue. There are also fail-fast methods, like boolean add(T item) which will return false if queue is full.

Bounded queues allow users to have many queues with maximum size which gives a better control over overall cache capacity. As mentioned above, when bounded queues are relatively small and can be used in collocated mode, all queue operations become extremely fast. Moreover, when used in combination with compute grid, users can collocate their compute jobs with grid nodes on which queues are located to make sure that all operations are local and there is none (or minimal) data distribution.

Here is an example of how a job could be send directly to the node on which a queue resides:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
grid.run(GridClosureCallMode.BALANCE, new Runnable() {
    // Specifies key used to determine queue affinity - must be the same as queue name.
    @GridCacheAffinityMapped
    public String queueAffinity() {
        return "myqueue";
    }

    @Override public void run() throws GridException {
        GridCacheQueue<String> queue = grid.cache().queue("myqueue");

        // Add queue elements (local operation due to collocation).
        for (int i = 0; i < 20; i++)
            q.add(i);

        // Remove queue elements (local operation due to collocation).
        for (int i = 0; i < 20; i++) {
            Integer item = q.poll();

            assert item != null;
        }

        // Make sure that queue is empty.
        assert q.isEmpty();
        assert q.poll() == null;
    }
});
Queue Types

Queue types are specified in GridCacheQueueTypedoc enumeration. GridGain supports three queue types out of the box: FIFO, LIFO, and PRIORITY types.

  • FIFO queue provides for first-in-first-out order of queue elements and generally is the most common way we are used to thinking about the queues. Elements are added at the tail of the queue and are polled from the head of the queue.

  • LIFO queue provides for last-in-first-out order of queue elements and generally work as a stack. Elements are added and retrieved from the tail of the queue.

  • PRIORITY queue orders elements using priority-based order specified by user. Priority of a queue item is specified using @GridCacheQueuePrioritydoc annotation. If priority attribute is not found, then priority of 0 is assigned by default. Here is an example of how priority queue can be created and used.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
public void priorityQueueExample {
    Random rand = new Random();

    Grid grid = G.grid();

    // Initialize new unbounded collocated priority queue.
    GridCacheQueue<PriorityItem> queue = grid.cache().queue("myqueue", PRIORITY);

    // Store 20 elements in queue with random priority.
    for (int i = 0; i < 20; i++) {
        int priority = rand.nextInt(20);

        queue.put(new PriorityItem(priority, "somedata-" + i));
    }

    PriorityItem item = null;

    int lastPriority = 0;

    do {
        item = queue.poll();

        // Ensure the elements are correctly ordered based on priority.
        assert lastPriority <= item.priority();

        lastPriority = item.priority();
    }
    while (item != null);
}

...

// Class defining sample queue element with its priority specified via
// @GridCacheQueuePriority annotation attached to priority field.
private static class PriorityItem implements Serializable {
    // Priority of queue item.
    @GridCacheQueuePriority
    private final int priority;

    private final String data;

    private SampleItem(int priority, String data) {
        this.priority = priority;
        this.data = data;
    }

    public int priority() {
        return priority;
    }
}
Cache Queues and Load Balancing

Given that elements will remain in the queue until someone takes them, and that no two nodes should ever receive the same element from the queue, cache queues can be used as an alternate work distribution and load balancing approach within GridGain. For example, you could simply put computations, such as instances of Runnable or GridAbsClosuredoc, into the queue and have threads on remote nodes call GridCacheQueue.take()doc method which will block if queue is empty. Once the take() method returns a job, a thread will process it and call take() again to get the next job. Given this approach, threads on remote nodes will only start working on the next job when they have completed the previous one, hence creating ideally balanced system where every node only takes the number of jobs it can process, and not more.

12.2.2. Distributed Sequences

12.2.3. Distributed AtomicLong

12.2.4. Distributed CountDownLatch

12.3. Cache Configuration

12.3.1. Overview

GridCacheConfigurationdoc interface defines cache runtime configuration. This configuration is passed to GridConfigurationAdapter.setCacheConfiguration(GridCacheConfiguration…)doc method. It defines all configuration parameters required to start a cache instance.

Note, that absolutely every configuration property in GridCacheConfigurationdoc is optional.

The following configuration parameters can be used to configure cache with GridCacheConfigurationAdapterdoc:

Setter Method Description Optional Default

setName(String)doc

Cache name.

Yes

null

setCacheMode(GridCacheMode)doc

Caching mode.

Yes

REPLICATEDdoc

setStartSize(int)doc

Initial size for internal hash map.

Yes

1024

setDefaultLockTimeout(long)doc

Default lock timeout in milliseconds.

Yes

0

setDefaultTimeToLive(long)doc

Time to live for all objects in cache.

Yes

0

setDefaultTxConcurrency(GridCacheTxConcurrency)doc

Default transaction concurrency.

Yes

OPTIMISTICdoc

setDefaultTxIsolation(GridCacheTxIsolation)doc

Default transaction isolation.

Yes

REPEATABLE_READdoc

setDefaultTxTimeout(long)doc

Default transaction timeout in milliseconds.

Yes

0

setAffinity(GridCacheAffinity)doc

Affinity for cache keys.

Yes

null

setAffinityMapper(GridCacheAffinityMapper)doc

Custom affinity mapper.

Yes

GridCacheDefaultAffinityMapperdoc

setEvictionPolicy(GridCacheEvictionPolicy)doc

Cache eviction policy.

Yes

GridCacheLirsEvictionPolicydoc

setEvictionKeyBufferSize(int)doc

Eviction key buffer size.

Yes

10240

setEvictSynchronized(boolean)doc

Flag indicating whether entries should be evicted from both, primary and backup nodes for partitioned caches, or the rest of the nodes for replicated caches.

Yes

false

setEvictNearSynchronized(boolean)doc

Flag indicating whether entries should be evicted from near caches when they are evicted on primary nodes.

Yes

true

setMaxEvictionOverflowRatio(float)doc

Maximum eviction overflow ratio.

Yes

10

setDgcFrequency(int)doc

Frequency in milliseconds for internal distributed garbage collector. Pass 0 to disable.

Yes

10000ms

setDgcRemoveLocks(boolean)doc

Flag indicating whether DGC should clear obsolete flags or not.

Yes

true

setDgcSuspectLockTimeout(int)doc

Suspect lock timeout in milliseconds for internal distributed garbage collector.

Yes

10000ms

setIndexAnalyzeFrequency(long)doc

Frequency of running H2 "ANALYZE" command.

Yes

600000ms

setIndexAnalyzeSampleSize(long)doc

Number of samples used to run H2 "ANALYZE" command.

Yes

10000

setIndexCleanup(boolean)doc

Flag indicating whether indexes should be deleted on system shutdown or startup.

Yes

true

setIndexFixedTyping(boolean)doc

Fixed typing flag.

Yes

true

setIndexFullClassName(boolean)doc

Flag indicating weather full or simple class names should be used for querying.

Yes

false

setIndexH2Options(String)doc

Any additional options for the underlying H2 database used for querying.

Yes

null

setIndexMaxOperationMemory(int)doc

Maximum memory used for delete and insert in bytes. 0 means no limit.

Yes

100000

setIndexMemoryOnly(boolean)doc

Flag indicating whether query indexes should be kept only in memory or offloaded on disk as well.

Yes

false

setIndexPath(String)doc

File path (absolute or relative to GRIDGAIN_HOME) to store cache indexes.

Yes

GRIDGAIN_HOME/work/cache/indexes

setIndexUsername(String)doc

Username to login to index database.

Yes

null

setIndexPassword(String)doc

Password to login to index database.

Yes

null

setStore(GridCacheStore)doc

Persistent storage for cache data.

Yes

null

setStoreEnabled(boolean)doc

Flag indicating whether store is enabled.

Yes

true

setStoreValueBytes(boolean)doc

Flag indicating if cached values should be additionally stored in serialized form.

Yes

true

setWriteFromBehindEnabled(boolean)doc

Flag indicating whether write-behind store is enabled

Yes

false

setWriteFromBehindBatchSize(int)doc

Maximum size of batch update in write-behind mode

Yes

512

setWriteFromBehindFlushFrequency(long)doc

Frequency of write queue flush events, in milliseconds (0 to disable time-based flush)

Yes

5000

setWriteFromBehindFlushSize(int)doc

Size of write queue at which flush event will be triggered

Yes

10240

setWriteFromBehindFlushThreadCount(int)doc

Number of threads that will flush the write queue when flush event is triggered

Yes

1

setPreloadMode(GridCachePreloadMode)doc

Cache preload mode.

Yes

ASYNCdoc

setPreloadBatchSize(int)doc

Preload batch size.

Yes

102400

setPreloadThreadPoolSize(int)doc

Size of preloading thread pool.

Yes

2

setNearEnabled(boolean)doc

Flag indicating whether near cache is enabled in case of PARTITIONED mode.

Yes

true

setNearEvictionPolicy(GridCacheEvictionPolicy)doc

Eviction policy for near cache.

Yes

GridCacheLirsEvictionPolicydoc

setNearStartSize(int)doc

Start size for near cache.

Yes

256

setAtomicSequenceReserveSize(int)doc

Default number of sequence values reserved for GridCacheAtomicSequence instances.

Yes

1000

setAutoIndexQueryTypes(Collection<GridCacheQueryType>)doc

Query types to use to auto index values of primitive types.

Yes

null

setBatchUpdateOnCommit(boolean)doc

Flag indicating if persistent store should be updated after every cache operation or once at commit time.

Yes

true

setCloner(GridCacheCloner)doc

Сloner to be used if CLONE flag is set on projection.

Yes

null

setInvalidate(boolean)doc

Invalidation flag for this transaction.

Yes

false

setRefreshAheadRatio(double)doc

Refresh-ahead ratio for cache entries. Values other than zero specify how soon entries will be auto-reloaded from persistent store prior to expiration.

Yes

0

setSwapEnabled(boolean)doc

Flag indicating whether swap storage ise enabled or not.

Yes

false

setSynchronousCommit(boolean)doc

Flag indicating whether nodes on which user transaction completed should wait for the same transaction on remote nodes to complete.

Yes

false

setSynchronousRollback(boolean)doc

Flag indicating whether nodes on which user transaction was rolled back should wait for the same transaction on remote nodes to complete.

Yes

false

setTransactionManagerLookup(GridCacheTmLookup)doc

Look up mechanism for available TransactionManager implementation, if any.

Yes

null

Some of the most commonly used configuration properties are explained in more detail below.

Cache Mode

Following cache modes are supported:

  • LOCALdoc - specifies local-only cache behaviour. In this mode caches residing on different grid nodes will not know about each other.

  • REPLICATEDdoc - specifies fully replicated cache behavior. In this mode all the keys are distributed to all participating nodes. User still has affinity control over subset of nodes for any given key via GridCacheAffinitydoc configuration.

  • PARTITIONEDdoc - specifies partitioned cache behaviour. In this mode the overall key set will be divided into partitions and all partitions will be split equally between participating nodes. User has affinity control over key assignment via GridCacheAffinitydoc configuration.

Affinity

Cache key affinity maps keys to nodes. GridCacheAffinitydoc interface is utilized for both replicated and partitioned caches.

Whenever a key is given to cache, it is first passed to a pluggable GridCacheAffinityMapperdoc which may potentially map this key to an alternate key which should be used for affinity. The key returned from affinityKey(Object)doc method is then passed to partition(Object)doc method to find out the partition for the key. Then this partition together with all participating nodes are passed to nodes(int, Collection)doc method which returns a collection of nodes. This collection of nodes is used for node affinity. In REPLICATED cache mode the key will be cached on all returned nodes; generally, all caching nodes participate in caching every key in replicated mode. In PARTITIONED mode, only primary and backup nodes are returned with primary node always in the first position. So if there is 1 backup node, then the returned collection will have 2 nodes in it - primary node in first position, and backup node in second.

12.3.2. Examples

GridCacheConfiguration may be defined in code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

GridCacheConfigurationAdapter cacheCfg = new GridCacheConfigurationAdapter();

cacheCfg.setName("mycache");
cacheCfg.setCacheMode(GridCacheMode.LOCAL);

cfg.setCacheConfiguration(cacheCfg);

...

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
    <property name="cacheConfiguration">
        <list>
            <bean class="org.gridgain.grid.cache.GridCacheConfigurationAdapter">
                <property name="name" value="mycache"/>

                <property name="cacheMode" value="LOCAL"/>
            </bean>
        </list>
    </property>
    ...
</bean>

13. GridGain Scalar - Scala DSL

TODO

14. GridGain Grover - Groovy++ DSL

TODO

15. REST APIs

GridGain REST api supports external connectivity to GridGain via REST. It comes in handy whenever GridGain Java API is not available directly, but it is still needed to execute GridGain tasks or retrieve cached data. For example, you can conveniently use GridGain REST API from other non-JVM languages, such as Ruby, or Python or any other language, whenever local instance of GridGain is not available.

15.1. Overview

Currently there are two ways to utilize GridGain REST API:

  • over HTTP protocol.

  • over Memcache binary protocol.

Note Memcache binary protocol will become available in 3.5.1 release.

15.1.1. HTTP protocol

All REST HTTP commands have the following format:

http://localhost:8080/gridgain?cmd=exe&...

where cmd is the name of the command followed by other command parameters. Every command may have different parameters, some of which may be mandatory and some optional. The commands parameters may be passed either via HTTP GET or POST, whichever one is preferred.

All commands return response in JSON format with following fields (note that some commands may return additional fields as well):

success

Boolean flag to indicate whether or not command completed successfully

response

Command response serialized as JSON - it is a requirement that responses comply with Java Bean standard (i.e. have getters and setters for fields)

error

Description of error associated with failed command execution. It is only provided if success flag is false

15.1.2. Memcache binary protocol

GridGain implements Memcache binary protocol. This allows to execute most of cache commands using one of available Memcache clients. Note that the client you choose must support binary protocol.

Note Memcache binary protocol will become available in 3.5.1 release.

15.2. Cache Commands

All cache commands in GridGain have one additional field in responses - affinityNodeId which tells the node ID of the primary node responsible for caching requested data. Users can use this ID to send future requests for the same data to the primary affinity node for better performance. Otherwise, whenever a request for data arrive on some node, that node will have to figure out the primary affinity node responsible for caching requested data and then send the request there. This will involve an extra network round trip which could have been avoided if the request came to the primary node directly.

The following list of commands are available to access GridGain cache.

GET

GET command is used get a value stored in cache. It is analogous to invoking GridCacheProjection.get(someKey) doc method. GET command supports the following parameters:

cmd

get

key

Mandatory parameter to specify the cache key for the value to be retrieved.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=get&cacheName=mycache&key=mykey

Example Response

{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":"some-value","success":true}
GET_ALL

GET_ALL command is used get several values stored in cache. It is analogous to invoking GridCacheProjection.getAll(keys) doc method. GET_ALL command supports the following parameters:

cmd

getall

k1…kN

Keys for the values to be retrieved. At least one must be specified.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=getall&cacheName=mycache&k1=mykey1&k2=mykey2

Example Response

{"affinityNodeId":"","error":"","response":{"mykey2":"myval2","mykey1":"myval1"},"success":true}
Note GET_ALL command will become available in 3.5.1 release.
PUT

PUT command is used to store a value in cache. It is analogous to invoking GridCacheProjection.putx(someKey, someValue) doc method. PUT command supports the following parameters:

cmd

put

key

Mandatory parameter to specify the cache key for the value to be stored.

val

Mandatory parameter to specify the value to cache, cannot be null or empty.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=put&cacheName=mycache&key=mykey&val=myval

Example Response

{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":true,"success":true}
PUT_ALL

PUT_ALL command is used to store several values in cache. It is analogous to invoking GridCacheProjection.putAll(map) doc method. PUT_ALL command supports the following parameters:

cmd

putall

k1…kN

Keys for the values to be stored. At least one must be specified.

v1…vN

Values to be stored cache, cannot be null or empty. Numver of values must be equal to number of keys.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=putall&cacheName=mycache&k1=mykey1&v1=myval1&k2=mykey2&v2=myval2

Example Response

{"affinityNodeId":"","error":"","response":true,"success":true}
Note PUT_ALL command will become available in 3.5.1 release.
REMOVE

REMOVE command is used remove a mapping stored in cache. It is analogous to invoking GridCacheProjection.removex(someKey) doc method. REMOVE command supports the following parameters:

cmd

rmv

key

Mandatory parameter to specify the cache key for the value to be removed.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=rmv&cacheName=mycache&key=mykey

Example Response

{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":true,"success":true}
REMOVE_ALL

REMOVE_ALL command is used remove a mapping stored in cache. It is analogous to invoking GridCacheProjection.removeAll(keys) doc method. REMOVE_ALL command supports the following parameters:

cmd

rmvall

k1…kn

Keys for the values to be removed. At least one must be specified.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=rmvall&cacheName=mycache&k1=mykey1&k2=mykey2

Example Response

{"affinityNodeId":"","error":"","response":true,"success":true}
Note REMOVE_ALL command will become available in 3.5.1 release.
REPLACE

REPLACE command is used to replace a value in cache only if there is already some existing mapping for the specified key. It is analogous to invoking GridCacheProjection.replacex(someKey, someValue) doc method. REPLACE command supports the following parameters:

cmd

rep

key

Mandatory parameter to specify the cache key for the value to be replaced.

val

Mandatory parameter to specify the value to cache, cannot be null or empty.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://1localhost:8080/gridgain?cmd=rep&cacheName=mycache&key=mykey&val=myval

Example Response

{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":false,"success":true}
CAS

CAS command stands for compare-and-set and is used to replace a value in cache only if it matches the provided value. Based on the values passed in, it has different behavior.

  • If both, val1 and val2 are null or empty, then this command is analogous to REMOVE command.

  • If val1 is not null or empty, but val2 is, then this command will store a value in cache only if there no existing mapping for the provided key. It is analogous to invoking GridCacheProjection.putxIfAbsent(someKey, someValue) doc method.

  • If val1 is null or empty, but val2 is not, then this command will remove a mapping for provided key only if current value is equal to val2. It is analogous to invoking GridCacheProjection.remove(someKey, someValue) doc method.

  • If both, val1 and val2 are not null or empty, then this command will replace a mapping for provided key only if current value is equal to val2. It is analogous to invoking GridCacheProjection.replace(someKey, oldValue, newValue) doc method.

CAS command supports the following parameters:

cmd

cas

key

Mandatory parameter to specify the cache key for the value to be set.

val1

Existing value stored in cache used for compare operation.

val2

New value to store in cache only if old value is equal to val1.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=cas&cacheName=mycache&key=mykey&val1=oldVal&val2=newVal

Example Response

{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":false,"success":true}
AFFINITY

AFFINITY command is used to retrieve primary affinity node responsible for storing cache key. It is analogous to invoking GridCacheProjection.mapKeyToNode(someKey) doc method. AFFINITY command supports the following parameters:

cmd

aff

key

Mandatory parameter to specify the cache key to get affinity node ID for.

cacheName

Optional cache name, if omitted, default cache will be used.

Example Request

http://localhost:8080/gridgain?cmd=aff&cacheName=mycache&key=mykey

Example Response

{"affinityNodeId":"d2ee8ea4-a2f0-4f41-9edd-ea25d68de6f8","error":"","response":true,"success":true}
METRICS

METRICS command is used to retrieve cache metrics or cache entry metrics.

METRICS command supports the following parameters:

cmd

cache

key

Optional parameter to specify the cache entry to get metrics for. If omitted, cache metrics will be returned.

cacheName

Optional cache name. if omitted, default cache metrics will be returned.

Example Request

http://localhost:8080/gridgain?cmd=cache&cacheName=mycache

Example Response

{"affinityNodeId":"","error":"","response":{"createTime":1298362596532,"hits":1,"misses":1,
"readTime":1298363347487,"reads":2,"writeTime":1298362597375,"writes":7},"success":true}
INCREMENT

INCREMENT command is used to increment integer value stored in cache. It supports the following parameters:

cmd

incr

key

Mandatory parameter to specify the cache key for the value to be incremented.

init

Parameter to specify initial (default) value. It will be set if value for provided key is not in cache.

delta

Parameter to specify value that will be added to value stored in cache.

cacheName

Optional cache name. if omitted, default cache metrics will be returned.

Example Request

http://localhost:8080/gridgain?cmd=incr&cacheName=mycache&key=key&init=0&delta=3

Example response

{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":3,"success":true}
Note INCREMENT command will become available in 3.5.1 release.
DECREMENT

DECREMENT command is used to decrement integer value stored in cache. It supports the following parameters:

cmd

decr

key

Mandatory parameter to specify the cache key for the value to be incremented.

init

Parameter to specify initial (default) value. It will be set if value for provided key is not in cache.

delta

Parameter to specify value that will be subtracted from value stored in cache.

cacheName

Optional cache name. if omitted, default cache metrics will be returned.

Example Request

http://localhost:8080/gridgain?cmd=decr&cacheName=mycache&key=key&init=0&delta=3

Example response

{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":-3,"success":true}
Note DECREMENT command will become available in 3.5.1 release.
APPEND

APPEND command is used to append string value stored in cache with provided string. It supports the following parameters:

cmd

append

key

Mandatory parameter to specify the cache key for the value to be updated.

val

Parameter to specify string that will be appended to stored value.

cacheName

Optional cache name. if omitted, default cache metrics will be returned.

Example Request

http://localhost:8080/gridgain?cmd=append&cacheName=mycache&key=key&val=_suffix

Example response

{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":true,"success":true}
Note APPEND command will become available in 3.5.1 release.
PREPEND

PREPEND command is used to prepend string value stored in cache with provided string. It supports the following parameters:

cmd

prepend

key

Mandatory parameter to specify the cache key for the value to be updated.

val

Parameter to specify string that will be prepended to stored value.

cacheName

Optional cache name. if omitted, default cache metrics will be returned.

Example Request

http://localhost:8080/gridgain?cmd=prepend&cacheName=mycache&key=key&val=prefix_

Example response

{"affinityNodeId":"d6f2d18d-22ee-4e10-9986-af71e75fc066","error":"","response":true,"success":true}
Note PREPEND command will become available in 3.5.1 release.

15.3. Topology Commands

Topology commands are used to retrieved various grid topology information from GridGain. The following commands are available to access GridGain topology:

TOPOLOGY

TOPOLOGY command is used to retrieve list of available GridGain nodes in grid topology.

TOPOLOGY command supports the following parameters:

cmd

top

mtr

true or false. Optional parameter to specify whether nodes metrics should be included to response or not. If omitted, metrics will not be included.

attr

true or false. Optional parameter to specify whether nodes attributes should be included to response or not. If omitted, attributes will not be included.

Example Request

http://localhost:8080/gridgain?cmd=top&mtr=false&attr=false

Example Response

{"error":"","response":[{"attributes":null,"externalAddresses":[],"internalAddresses":["localhost"],
"metrics":null,"nodeId":"4ffa1248-0d4f-4e4a-bf79-e8b586b0dc31"}],"success":true}
NODE

NODE command is used to retrieve information about a single GridGain node based on either node ID or any of node’s available IP addresses.

NODE command supports the following parameters:

cmd

node

id

ID of the node to retrieve information about. If omitted, ip should be provided. If id and ip are provided, both are used.

ip

IP (external or internal) of the node to retrieve information about. If omitted, id should be provided. If id and ip are provided, both are used.

Note: if multiple nodes have the same IP, then there are no guarantees on what node is returned.

mtr

true or false. Optional parameter to specify whether nodes metrics should be included to response or not. If omitted, metrics will not be included.

attr

true or false. Optional parameter to specify whether nodes attributes should be included to response or not. If omitted, attributes will not be included.

Example Request

http://localhost:8080/gridgain?cmd=node&ip=1.2.3.4&id=4ffa1248-0d4f-4e4a-bf79-e8b586b0dc31

Example Response

{"error":"","response":{"attributes":null,"externalAddresses":[],"internalAddresses":["1.2.3.4"],
"metrics":null,"nodeId":"4ffa1248-0d4f-4e4a-bf79-e8b586b0dc31"},"success":true}

15.4. Task Execution Commands

Task execution commands provide a way to execute GridGain tasks over HTTP.

Task execution commands respond with the special entity having the following fields:

error

Description of the error occurred while task execution. Do not mess with error of the response.

finished

Boolean flag indicating whether task execution is finished or not.

id

ID of the task to query results in case of asynchronous execution.

result

Task execution result serialized as JSON.

The following commands are available for GridGain task execution:

EXE

EXE command is used to execute GridGain task remotely with specified parameters and returns task execution result back (or task ID to query results in case of asynchronous execution).

EXE command supports the following parameters:

cmd

exe

name

Mandatory parameter. Task name or task class name.

timeout

Optional parameter. Task execution timeout in milliseconds. If not provided or equals to 0 the system will wait indefinitely for execution completion. If provided should be greater or equal to 0.

p1,..,pN

Optional task parameters. Any number of parameters is possible. If only parameter is provided it is passed as is, if two or more are provided, then they are passed as array.

async

true or false. Optional sync/async execution flag. If omitted, task will be executed synchronously. If value is true then result may be queried further via RESULT command (task ID will be returned in response).

Example Request

http://localhost:8080/gridgain?cmd=exe&name=org.gridgain.grid.kernal.processors.rest.TestTask2

Example Response

{"error":"","response":{"error":"","finished":true,"id":"0e731fb3-77ba-4932-b625-a2e197bc444c~16cd1450-fa4f-4bb0-8029-6048b905a5dc",
"result":"Task 2 result."},"success":true}

Example Request

http://localhost:8080/gridgain?cmd=exe&name=org.gridgain.grid.kernal.processors.rest.TestTask2&timeout=1&async=true

Example Response

{"error":"","response":{"error":"","finished":false,"id":"7b3d682e-759c-4310-aaa2-ddfba54fb0b8~16cd1450-fa4f-4bb0-8029-6048b905a5dc",
"result":null},"success":true}
RESULT

RESULT command is used to retrieve results of GridGain task execution (initiated by EXE command).

RESULT command supports the following parameters:

cmd

res

id

Mandatory parameter. ID of the task (returned in response to EXE command).

Example Request

http://localhost:8080/gridgain?cmd=res&id=80ae2a49-029e-439a-bee5-8bad67381173~4186cc96-0f62-45dc-976f-979bfea08a90

Example Response

{"error":"","response":{"error":"","finished":true,"id":"80ae2a49-029e-439a-bee5-8bad67381173~4186cc96-0f62-45dc-976f-979bfea08a90",
"result":"Task 2 result."},"success":true}

15.5. REST Authentication

To control access to the REST API, you may require authentication by providing REST secret key (GridConfiguration.getRestSecretKey() doc).

If secret key is provided, then all requests should contain authentication token.

For REST over http(s) token is sent via X-Signature header. .

Token is built using the following algorithm:

  1. Client makes up a string out of timestamp (in milliseconds) and secret key separated by semicolon - timestamp:secretKey;

  2. Client calculates SHA-1 hash of the string;

  3. Finally makes up a token out of timestamp value and BASE64 encoded hash calculated during the previous step - timstamp:hash_base64.

Protocol implementations split token to fetch timestamp and make the same operations, then compare the result hash with provided one. If results are equal request is authenticated.

Tip For more security it is recommended to access REST API via https instead of http.

Example

secretKey

secret-key

timestamp

1298966938803

hash of 1298966938803:secret-key (base64)

emcRg3ZcVuce4AwDGXn4e4n2kqA=

X-Signature

1298966938803:emcRg3ZcVuce4AwDGXn4e4n2kqA=

16. Grid Enabling JUnits

16.1. Overview

Ability to distribute JUnit tests allows you to get test results from your build server 2, 3, or 5 times faster depending on amount of nodes you allocate to run your tests. You can also run your distributed JUnits directly from IDE and all IDE native JUnit integration semantics will be preserved.

Distributed Junit support is added starting GridGain 1.6.0 release. In a nutshell it simply takes your regular JUnit TestSuite and runs it in parallel on remote nodes. Even if you don’t have remote nodes, the tests within your TestSuite will run in parallel on local node. Both, individual tests and test suites are supported. If you have nested test suites inside of your distributed test suite, then the whole suite will executed in parallel on remote node (note, that tests within a nested suite will still execute sequentially).

GridGain distributed JUnit support gives you the following benefits:

Feature Description

Minimal to Zero Code Change

Simply switch to using GridJunit3TestSuite doc or attach @GridifyTestdoc annotation to your existing static suite() method or Junit4 suite and you are good to go.

Peer Class Loading

You don’t need to explicitly deploy your tests or your code on every grid node, the deployment happens automatically.

Nested Test Grouping

With GridGain you have full control over how tests are grouped for parallel or remote execution by combining tests within nested test suites.

Customizable Test Routing

With GridGain you have full control over how every test gets routed to remote node for execution by providing your own GridTestRouterdoc implementation. By default GridTestRouterAdapterdoc is used which routes tests in round-robin fashion between nodes.

Local Test Suites

Support for tests that can only be executed locally (usually due to environment issues), but still can benefit from parallel execution.

Configurable Test Scheduling

With GridGain you can configure how many tests can run in parallel on local or remote nodes via parallelJobsNumber configuration parameter on GridCollisionSpi doc SPI.

Native IDE integration

You can run your JUnit tests directly from any IDE, be that IDEA, Eclipse, NetBeans, etc… and your distributed tests will execute as if it was a local execution - all logging and failures will be preserved.

16.2. Supported Implementations

You can distribute your JUnits in 2 different way. One way is to use distributed test suites for distributed JUnit3 and distributed JUnit4 directly, and another is to use @GridifyTest doc annotation with AOP.

16.2.1. Distributed JUnit3

Distributed JUnit support has been added starting GridGain 1.6.0 release.

You can distribute your JUnit3 test suites in 2 different ways. One way is to use GridJunit3TestSuite doc directly instead of the usual JUnit3 TestSuite. This is perhaps the easiest and most straight forward way to grid-enable your JUnit3 test suites.

Another way is by attaching @GridifyTestdoc annotation to your static suite() methods on JUnit3 test suites. The advantage of this approach is that you can provide test configuration parameters, such as timeout or custom test router right as annotation parameter in code, without having to change your existing test suites or having to pass extra VM arguments.

GridJunit3TestSuite

GridJunit3TestSuitedoc is the test suite that handles distributing JUnit tests automatically. Simply add tests to this suite just like you would for regular JUnit3 suites, and these tests will be executed in parallel on the grid. Note that if there are no other grid nodes, this suite will still ensure parallel test execution within single JVM.

Bellow is an example of distributed JUnit3 test suite:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
public class GridJunit3ExampleTestSuite {
    // Standard JUnit3 static suite method.
    public static TestSuite suite() {
        TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");

        // Add tests.
        suite.addTestSuite(TestA.class);
        suite.addTestSuite(TestB.class);
        suite.addTestSuite(TestC.class);

        return suite;
    }
}

If you have four tests A, B, C, and D, and if you need to run A and B sequentially, then you should create a nested test suite with test A and B as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
public class GridJunit3ExampleTestSuite {
    // Standard JUnit3 static suite method.
    public static TestSuite suite() {
        TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");

        // Nested test suite to run tests A and B sequentially.
        TestSuite nested = new TestSuite("Example Nested Sequential Suite");

        nested.addTestSuite(TestA.class);
        nested.addTestSuite(TestB.class);

        // Add tests A and B.
        suite.addTest(nested);

        // Add other tests.
        suite.addTestSuite(TestC.class);

        return suite;
    }
}

GridJunit3LocalTestSuite

Some tests can only be executed locally mostly due to some environment issues. However they still can benefit from parallel execution with other tests. GridGain supports it via GridJunit3LocalTestSuite doc suites that can be nested within GridJunit3TestSuite doc test suite.

To use local test suite within distributed test suite, simply add it to distributed test suite as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class GridJunit3ExampleTestSuite {
    // Local test suite example.
    public static TestSuite suite() {
        TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");

        // Local nested test suite to always run tests A and B
        // on the local node.
        TestSuite nested = new GridJunit3LocalTestSuite("Example Nested Sequential Suite");

        nested.addTestSuite(TestA.class);
        nested.addTestSuite(TestB.class);

        // Add local tests A and B.
        suite.addTest(nested);

        // Add other tests.
        suite.addTestSuite(TestC.class);
        suite.addTestSuite(TestD.class);

        return suite;
    }
}

Logging

When running distributed JUnit, all the logging that is done to System.out or System.err is preserved. GridGain will accumulate all logging that is done on remote nodes, send them back to originating node and associate all log statements with their corresponding tests. This way, for example, if you are running tests from and IDEA or Eclipse (or any other IDE) you would still see the logs as if it was a local run. However, since remote nodes keep all log statements done within a single individual test case in memory, you must make sure that enough memory is allocated on every node and that individual test cases do not spit out GigaBytes of log statements. Also note, that logs will be sent back to originating node upon completion of every test, so don’t be alarmed if you don’t see any log statements for a while and then all of them appear at once.

GridGain achieves such log transparency via reassigning System.out or System.err to internal PrintStream implementation. However, when using Log4J (or any other logging framework) within your tests you must make sure that it is configured with ConsoleAppender and that ConsoleAppender.setFollow(boolean) attribute is set to true. Logging to files is not supported yet and is planned for future releases.

Test Nesting

GridJunit3TestSuitedoc instances can be nested within each other as deep as needed. However all nested distributed test suites will be treated just like regular JUnit test suites and not as distributed test suites. This approach becomes convenient when you have several distributed test suites that you would like to be able to execute separately in distributed fashion, but at the same time you would like to be able to execute them as a part of larger distributed suites.

GridifyTest Annotation

To enable JUnit3 tests using @GridifyTestdoc annotation, simply attach this annotation to static suite() method for a test suite you would like to grid-enable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class GridifyJunit3ExampleTestSuite {
    // Standard JUnit3 suite method. Note we attach @GridifyTest
    // annotation to it, so it will be grid-enabled.
    @GridifyTest
    public static TestSuite suite() {
        TestSuite suite = new TestSuite("Example Test Suite");

        // Nested test suite to run tests A and B sequentially.
        TestSuite nested = new TestSuite("Example Nested Sequential Suite");

        nested.addTestSuite(TestA.class);
        nested.addTestSuite(TestB.class);

        // Add tests A and B.
        suite.addTest(nested);

        // Add other tests.
        suite.addTestSuite(TestC.class);
        suite.addTestSuite(TestD.class);

        return suite;
    }
}
Configuration

To run distributed JUnit tests you need to start other instances of GridGain. You can do so by running GRIDGAIN_HOME/bin/ggjunit.{sh|bat} script, which will start default configuration. If configuration other than default is required, then use regular GRIDGAIN_HOME/bin/ggstart.{sh|bat} script and pass your own Spring configuration file as a parameter to the script.

You can use the following configuration parameters to configure distributed test suite locally. Note that many parameters can be overridden by setting corresponding JVM parameters defined in GridTestVmParametersdoc at VM startup.

Configuration Method Default Value Description

setDisabled(boolean)doc

false

If true then GridGain will be turned off and suite will run locally. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_DISABLEDdoc JVM parameter to true. This parameter comes handy when you would like to turn off GridGain without changing the actual code.

setConfigurationPath(String)doc

config/junit/junit-spring.xml

Optional path to GridGain Spring configuration file for running JUnit tests. This property can be overridden by setting GridTestVmParameters.GRIDGAIN_CONFIG doc VM parameter. Note that the value can be either absolute value or relative to GRIDGAIN_HOME installation folder.

setRouterClassName(String)doc

GridTestRouterAdapterdoc class name.

Optional name of test router class that implements GridTestRouterdoc interface. If not provided, then tests will be routed in round-robin fashion using default GridTestRouterAdapter doc. The value of this parameter can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_ROUTERdoc VM parameter to the name of your own custom router class.

GRIDGAIN_ROUTER_PREFER_REMOTEdoc

false

This value can only be set as VM parameter. Set it to true, e.g. -DGRIDGAIN_ROUTER_PREFER_REMOTE=true, if you would like test router to not route tests to local node if there are remote nodes present. Note that this property works only with default test router.

setRouterClass(Class)doc

null

Same as setRouterClassName(String)doc, but sets the actual class instead of the name.

setTimeout(long)doc

0 which means that tests will never timeout.

Maximum timeout value in milliseconds after which test suite will return without waiting for the remaining tests to complete. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_TIMEOUTdoc JVM parameter to the timeout value for the tests.

Test Scheduling

With GridGain you can configure how many tests you can run in parallel by specifying parallelJobsNumber configuration parameter on GridCollisionSpidoc. Simply uncomment the following section in GRIDGAIN_HOME/config/junit/junit-spring.xml file:

1
2
3
4
5
<property name="collisionSpi">
    <bean class="org.gridgain.grid.spi.collision.fifoqueue.GridFifoQueueCollisionSpi">
        <property name="parallelJobsNumber" value="1"/>
    </bean>
</property>

The XML configuration above will guarantee that only 1 test can run at a time on local or remote nodes. You can ensure this way that although your tests run in parallel on different nodes, within a single node only one test can be running and all other ones are waiting.

Starting Grid Node

To start a remote node for JUnit tests, open the terminal window on Linux/Mac OS X or Command Prompt on Windows, change directory to GRIDGAIN_HOME/bin and run the ggstart.{sh|bat} script. However, distributed JUnits have to use GridTestExecutorServicedoc which is pre-configured in GRIDGAIN_HOME/config/junit/junit-spring.xml Spring configuration file. You need to specify a path to this file to the gridgain startup script as follows:

ggstart.bat config/junit/junit-spring.xml

or starting from GridGain 1.6.1, simply execute ggjunit.{sh|bat} script:

ggjunit.bat

It takes 2-3 seconds for grid node to start and if everything worked fine you should see starting log ending with successful start acknowledgment.

Distributed Junit3 Example

This example will demonstrate how GridGain can distribute your long running JUnit3 tests or test suites across grid and hence dramatically speeding up overall execution of all tests.

To try this example you will need to open GridJunit3ExampleTestSuite.java in IDEA, Eclipse or any other IDE and run this JUnit3 suite using standard IDE JUnit integration. You will observe how execution of the tests is offloaded to remote nodes and then the results are seen in the IDE just as if it was a local run.

To run this example you need to start one or more additional grid nodes. For simplicity, you can start these nodes on the same box on which you are running the example.

Create GridJunit3TestSuite Suite

The only difference from standard JUnit3 suites is that instead of creating a new TestSuite we create a new GridJunit3TestSuitedoc suite.

1
TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");

Running Tests Sequentially

Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence.

1
2
3
4
5
6
7
8
// Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");

nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);

// Add tests A and B.
suite.addTest(nested);

Running Tests Locally

Certain tests must run locally no matter what, often due to some environmental issues. Yet these tests can benefit from parallel execution with other tests. GridGain supports it via GridJunit3LocalTestSuite doc suite. For example, the code below guarantees that TestC will always run locally.

1
2
3
// Add TestC to execute always on the local node but still in
// parallel with other tests.
suite.addTest(new GridJunit3LocalTestSuite(TestC.class, "Local suite"));

Full Source Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public final class GridJunit3ExampleTestSuite {
    /**
     * Standard JUnit3 static suite method.
     *
     * @return JUnit3 suite.
     */
    public static TestSuite suite() {
        TestSuite suite = new GridJunit3TestSuite("Example Grid Test Suite");

        // Nested test suite to run tests A and B sequentially.
        TestSuite nested = new TestSuite("Example Nested Sequential Suite");

        nested.addTestSuite(TestA.class);
        nested.addTestSuite(TestB.class);

        // Add tests A and B.
        suite.addTest(nested);

        // Add TestC to execute always on the local node but still in
        // parallel with other tests.
        suite.addTest(new GridJunit3LocalTestSuite(TestC.class, "Local suite"));

        // Add other tests.
        suite.addTestSuite(TestD.class);

        return suite;
    }
}
Distributed Junit3 Example With @GridifyTest Annotation

This example will demonstrate how GridGain can distribute your long running JUnit3 tests or test suites across grid using @GridifyTestdoc annotation.

To try this example you will need to open GridifyJunit3ExampleTestSuite in IDEA, Eclipse or any other IDE and run this JUnit3 suite using standard IDE JUnit integration. You will observe how execution of the tests is offloaded to remote nodes and then the results are seen in the IDE just as if it was a local run. All you had to do is attach @GridifyTestdoc annotation to your standard static suite() method for JUnit3 test suites.

Configuration

In order to enable @GridifyTestdoc you must enable either AspectJ or JBoss AOP.

JBoss AOP

Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:

  • The following JVM configuration must be present:

    • -javaagent:[path to jboss-aop-jdk50-4.x.x.jar]

    • -Djboss.aop.class.path=[path to gridgain.jar]

    • -Djboss.aop.exclude=org,com -Djboss.aop.include=org.gridgain.examples

  • The following JARs should be in a classpath:

    • javassist-4.x.x.jar

    • jboss-aop-jdk50-4.x.x.jar

    • jboss-aspect-library-jdk50-4.x.x.jar

    • jboss-common-4.x.x.jar

    • trove-1.0.x.jar

AspectJ AOP

The following configuration needs to be applied to enable AspectJ byte code weaving.

  • JVM configuration should include: -javaagent:GRIDGAIN_HOME/libs/aspectjweaver-1.5.3.jar

  • Classpath should contain the GRIDGAIN_HOME/config/aop/aspectj folder.

Attach @GridifyTest Annotation

The only difference from standard JUnit3 suites is that we need to attach @GridifyTest doc annotation to standard static suite() method as follows:

1
2
3
4
@GridifyTest
public static TestSuite suite() {
    ...
}

You can pass configuration parameters into GridifyTest annotation. Refer to @GridifyTest doc documentation for more information.

Running Tests Sequentially

Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence.

1
2
3
4
5
6
7
8
// Nested test suite to run tests A and B sequentially.
TestSuite nested = new TestSuite("Example Nested Sequential Suite");

nested.addTestSuite(TestA.class);
nested.addTestSuite(TestB.class);

// Add tests A and B.
suite.addTest(nested);

Full Source Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
 * Regular JUnit3 suite. Note that because of {@link GridifyTest} annotation,
 * all tests will execute in parallel on the grid.
 * <p>
 * Note that since {@link TestA} and {@link TestB} are added to this
 * suite from within another nested suite and not directly, they will
 * always execute sequentially, however still in parallel with other
 * tests.
 */
public final class GridifyJunit3ExampleTestSuite {
    /**
     * Standard JUnit3 static <tt>suite()</tt> method. Note we attach {@link GridifyTest}
     * annotation to it, so it will be grid-enabled.
     *
     * @return JUnit3 suite.
     */
    @GridifyTest
    public static TestSuite suite() {
        TestSuite suite = new TestSuite("Example Test Suite");

        // Nested test suite to run tests A and B sequentially.
        TestSuite nested = new TestSuite("Example Nested Sequential Suite");

        nested.addTestSuite(TestA.class);
        nested.addTestSuite(TestB.class);

        // Add tests A and B.
        suite.addTest(nested);

        // Add other tests.
        suite.addTestSuite(TestC.class);
        suite.addTestSuite(TestD.class);

        return suite;
    }
}

16.2.2. Distributed JUnit 4

Distributed JUnit support has been added starting GridGain 1.6.0 release.

You can distribute your JUnit4 test suites in 2 different ways. One way is to use GridJunit4Suite doc directly instead of the usual JUnit4 Suite class. This is perhaps the easiest and most straight forward way to grid-enable your JUnit4 test suites.

Another way is with AOP by attaching @GridifyTestdoc annotation to the same class you attach @RunWith(Suite.class) annotation. The advantage of this approach is that you can provide test configuration parameters, such as timeout or custom test router right as annotation parameter in code, without having to pass extra VM arguments.

GridJunit4Suite

GridJunit4Suitedoc is standard JUnit4 test suite runner for distributing JUnit4 tests. Simply add tests to this suite runner just like you would for regular JUnit4 suites, and these tests will be executed in parallel on the grid. Note that if there are no other grid nodes, this suite runner will still ensure parallel test execution within single VM.

Below is an example of distributed JUnit4 test suite:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
@RunWith(GridJunit4Suite.class)
@SuiteClasses({
    TestA.class, // TestA will run in parallel on the grid.
    TestB.class, // TestB will run in parallel on the grid.
    TestC.class, // TestC will run in parallel on the grid.
    TestD.class // TestD will run in parallel on the grid.
})
public class GridJunit4ExampleSuite {
    // No-op.
}

If you have four tests A, B, C, and D, and if you need to run A and B sequentially, then you should create a nested test suite with test A and B as follows:

1
2
3
4
5
6
7
8
9
@RunWith(GridJunit4Suite.class)
@SuiteClasses({
    GridJunit4ExampleNestedSuite.class, // Nested suite that will execute tests A and B added to it sequentially.
    TestC.class, // TestC will run in parallel on the grid.
    TestD.class // TestD will run in parallel on the grid.
})
public class GridJunit4ExampleSuite {
    // No-op.
}
1
2
3
4
5
6
7
@SuiteClasses({
    TestA.class,
    TestB.class
})
public class GridJunit4ExampleNestedSuite {
    // No-op.
}

Note that you can also grid-enable existing JUnit4 tests using @GridifyTestdoc annotation which you can attach to the same class you attach @RunWith annotation to.

GridJunit4LocalSuite

Some tests can only be executed locally mostly due to some environment issues. However they still can benefit from parallel execution with other tests. GridGain supports it via GridJunit4LocalSuite doc suites that can be nested within GridJunit4Suite doc test suites.

To use local test suite within distributed test suite, simply add it to distributed test suite as follows:

1
2
3
4
5
6
7
8
9
@RunWith(GridJunit4Suite.class)
@SuiteClasses({
    TestA.class,
    TestB.class,
    GridJunit4ExampleNestedLocalSuite.class, // Local suite that will execute its test C locally.
})
public class GridJunit4ExampleSuite {
    // No-op.
}
1
2
3
4
5
6
7
8
@RunWith(GridJunit4LocalSuite.class) // Specify local suite to run tests.
@SuiteClasses({
    TestC.class,
    TestD.class
})
public class GridJunit4ExampleNestedLocalSuite {
    // No-op.
}

Logging

When running distributed JUnit, all the logging that is done to System.out or System.err is preserved. GridGain will accumulate all logging that is done on remote nodes, send them back to originating node and associate all log statements with their corresponding tests. This way, for example, if you are running tests from and IDEA or Eclipse (or any other IDE) you would still see the logs as if it was a local run. However, since remote nodes keep all log statements done within a single individual test case in memory, you must make sure that enough memory is allocated on every node and that individual test cases do not spit out GigaBytes of log statements. Also note, that logs will be sent back to originating node upon completion of every test, so don’t be alarmed if you don’t see any log statements for a while and then all of them appear at once.

GridGain achieves such log transparency via reassigning System.out or System.err to internal PrintStream implementation. However, when using Log4J (or any other logging framework) within your tests you must make sure that it is configured with ConsoleAppender and that ConsoleAppender.setFollow(boolean) attribute is set to true. Logging to files is not supported yet and is planned for future releases.

Test Nesting

GridJunit4Suitedoc instances can be nested within each other as deep as needed. However all nested distributed test suites will be treated just like regular JUnit test suites and not as distributed test suites. This approach becomes convenient when you have several distributed test suites that you would like to be able to execute separately in distributed fashion, but at the same time you would like to be able to execute them as a part of larger distributed suites.

GridifyTest Annotation

To enable JUnit4 tests using @GridifyTestdoc annotation, you need to attach this annotation to the same class that has Suite annotation (only Suite runners can be grid-enabled in JUnit4).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
@RunWith(Suite.class)
@SuiteClasses({
    GridJunit4ExampleNestedSuite.class, // Nested suite that will execute tests A and B added to it sequentially.
    TestC.class, // Test C will run in parallel with other tests.
    TestD.class // TestD will run in parallel with other tests.
})
@GridifyTest // Run this suite on the grid.
public class GridifyJunit4ExampleSuite {
    // No-op.
}
1
2
3
4
5
6
7
8
@RunWith(Suite.class)
@SuiteClasses({
    TestA.class,
    TestB.class
})
public class GridJunit4ExampleNestedSuite {
    // No-op.
}
Configuration

To run distributed JUnit tests you need to start other instances of GridGain. You can do so by running GRIDGAIN_HOME/bin/ggjunit.{sh|bat} script, which will start default configuration. If configuration other than default is required, then use regular GRIDGAIN_HOME/bin/ggstart.{sh|bat} script and pass your own Spring configuration file as a parameter to the script.

You can use the following configuration parameters to configure distributed test suite locally. These parameters are set via @GridifyTestdoc annotation. Note that GridGain will check these parameters even if AOP is not enabled. Also note that many parameters can be overridden by setting corresponding VM parameters defined in GridTestVmParametersdoc at VM startup.

Configuration Method Default Value Description

@GridifyTest.disabled()doc

false

If true then GridGain will be turned off and suite will run locally. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_DISABLEDdoc JVM parameter to true. This parameter comes handy when you would like to turn off GridGain without changing the actual code.

@GridifyTest.configPath()doc

config/junit/junit-spring.xml

Optional path to GridGain Spring configuration file for running JUnit tests. This property can be overridden by setting GridTestVmParameters.GRIDGAIN_CONFIGdoc VM parameter. Note that the value can be either absolute value or relative to GRIDGAIN_HOME installation folder.

@GridifyTest.routerClass()doc

GridTestRouterAdapter doc class.

Optional router class that implements GridTestRouter doc interface. If not provided, then tests will be routed in round-robin fashion using default GridTestRouterAdapterdoc. The value of this parameter can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_ROUTER doc VM parameter to the name of your own custom router class.

GRIDGAIN_ROUTER_PREFER_REMOTEdoc

false

This value can only be set as VM parameter. Set it to true, e.g. -DGRIDGAIN_ROUTER_PREFER_REMOTE=true, if you would like test router to not route tests to local node if there are remote nodes present. Note that this property works only with default test router.

@GridifyTest.timeout()doc

0 which means that tests will never timeout.

Maximum timeout value in milliseconds after which test suite will return without waiting for the remaining tests to complete. This value can be overridden by setting GridTestVmParameters.GRIDGAIN_TEST_TIMEOUT doc JVM parameter to the timeout value for the tests.

Test Scheduling

With GridGain you can configure how many tests you can run in parallel by specifying parallelJobsNumber configuration parameter on GridCollisionSpidoc. Simply uncomment the following section in GRIDGAIN_HOME/config/junit/junit-spring.xml file:

1
2
3
4
5
<property name="collisionSpi">
    <bean class="org.gridgain.grid.spi.collision.fifoqueue.GridFifoQueueCollisionSpi">
        <property name="parallelJobsNumber" value="1"/>
    </bean>
</property>

The XML configuration above will guarantee that only 1 test can run at a time on local or remote nodes. You can ensure this way that although your tests run in parallel on different nodes, within a single node only one test can be running and all other ones are waiting.

Starting Grid Node

To start a remote node for JUnit tests, open the terminal window on Linux/Mac OS X or Command Prompt on Windows, change directory to GRIDGAIN_HOME/bin and run the ggstart.{sh|bat} script. However, distributed JUnits have to use GridTestExecutorServicedoc which is pre-configured in GRIDGAIN_HOME/config/junit/junit-spring.xml Spring configuration file. You need to specify a path to this file to the gridgain startup script as follows:

gridgain.bat config/junit/junit-spring.xml

or starting from GridGain 1.6.1, simply execute ggjunit.{sh|bat} script:

gridgain-junit.bat

It takes 2-3 seconds for grid node to start and if everything worked fine you should see starting log ending with successful start acknowledgment.

Distributed Junit4 Example

This example will demonstrate how GridGain can distribute your long running JUnit4 tests or test suites across grid and hence dramatically speeding up overall execution of all tests.

To try this example you will need to open GridJunit3ExampleTestSuite.java in IDEA, Eclipse or any other IDE and run this JUnit3 suite using standard IDE JUnit integration. You will observe how execution of the tests is offloaded to remote nodes and then the results are seen in the IDE just as if it was a local run.

To run this example you need to start one or more additional grid nodes. For simplicity, you can start these nodes on the same box on which you are running the example.

Specify GridJunit4Suite Runner

The only difference from standard JUnit4 suites is that instead of specifying Suite runner, we must specify GridJunit4Suitedoc runner as follows.

1
@RunWith(GridJunit4Suite.class)

Running Tests Sequentially

Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence.

1
2
3
4
5
6
7
8
@RunWith(Suite.class)
@SuiteClasses({
    TestA.class,
    TestB.class
})
public class GridJunit4ExampleNestedSuite {
    // No-op.
}

Running Tests Locally

Certain tests must run locally no matter what, often due to some environmental issues. Yet these tests can benefit from parallel execution with other tests. GridGain supports it via GridJunit4LocalSuitedoc suite runner.

1
2
3
4
5
@RunWith(GridJunit4LocalSuite.class) // Specify local suite to run tests.
@SuiteClasses(TestC.class)
public class GridJunit4ExampleNestedLocalSuite {
    // No-op.
}

Full Source Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
@RunWith(GridJunit4Suite.class)
@SuiteClasses({
    // Nested suite that will execute tests A and B added to it sequentially.
    GridJunit4ExampleNestedSuite.class,
    // Local suite that will execute its test C locally.
    GridJunit4ExampleNestedLocalSuite.class,
    // TestD will run in parallel with (A and B) and C tests.
    TestD.class
})
public class GridJunit4ExampleSuite {
    // No-op.
}
Distributed Junit4 Example With @GridifyTest Annotation

This example will demonstrate how GridGain can distribute your long running JUnit4 tests or test suites across grid using @GridifyTestdoc annotation.

Configuration

In order to enable @GridifyTestdoc you must enable either AspectJ or JBoss AOP.

JBoss AOP

Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:

  • The following JVM configuration must be present:

    • -javaagent:[path to jboss-aop-jdk50-4.x.x.jar]

    • -Djboss.aop.class.path=[path to gridgain.jar]

    • -Djboss.aop.exclude=org,com -Djboss.aop.include=org.gridgain.examples

  • The following JARs should be in a classpath:

    • javassist-4.x.x.jar

    • jboss-aop-jdk50-4.x.x.jar

    • jboss-aspect-library-jdk50-4.x.x.jar

    • jboss-common-4.x.x.jar

    • trove-1.0.x.jar

AspectJ AOP

The following configuration needs to be applied to enable AspectJ byte code weaving:

  • JVM configuration should include: -javaagent:GRIDGAIN_HOME/libs/aspectjweaver-1.5.3.jar

  • Classpath should contain the GRIDGAIN_HOME/config/aop/aspectj folder.

Attach @GridifyTest Annotation

The only difference from standard JUnit4 suites is that we need to attach @GridifyTest doc annotation to the same class that has @RunWith(Suite.class) annotation.

Running Tests Sequentially

Sometimes it is desired that certain tests run in sequence, yet parallel with other tests. For that you simply need to create a nested suite, then the whole suite will be executed remotely. For example, the following lines of code will guarantee that TestA and TestB always run in sequence:

1
2
3
4
5
6
7
8
@RunWith(Suite.class)
@SuiteClasses({
    TestA.class,
    TestB.class
})
public class GridifyJunit4ExampleNestedSuite {
    // No-op.
}

Full Source Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
@RunWith(Suite.class)
@SuiteClasses({
    // Nested suite that will execute tests A and B added to it sequentially.
    GridJunit4ExampleNestedSuite.class,
    // Test C will run in parallel with other tests.
    TestC.class,
    // TestD will run in parallel with other tests.
    TestD.class
})
@GridifyTest // Run this suite on the grid.
public class GridifyJunit4ExampleSuite {
    // No-op.
}

16.3. Bamboo Integration

When plugging JUnit3 tests into Bamboo continuous build, Bamboo may start displaying wrong test count, even though all tests do execute. This happens because Bamboo for some reason cannot properly process test class names for classes augmented by JavaAsssist which it finds in JUnit XML file from Ant.

To fix it, add the following Ant target to your Ant script after all JUnit targets. This target will go through all JUnit test XML results and remove JavaAssist suffices from test class names.

1
2
3
4
5
6
7
<replaceregexp byline="true">
    <regexp pattern='(&#60;testcase.*classname=\".*)(_\$\$_javassist_\d+)'/>
    <substitution expression='\1'/>
    <fileset dir="/foo/bar/test-results">
        <include name="TEST-*.xml"/>
    </fileset>
</replaceregexp>

16.4. Log4j Integration

Currently only console appenders are supported. So if you need an output file to be generated on say Bamboo we recommend you to use following Ant feature. Ant JUnit test tag has a formatter property and allows you to redirect test output to the file or files. Use it like below:

1
2
3
4
5
<junit>
    <test todir="your_path">
        <formatter type="plain"/>
    </test>
</junit>

This example will save entire test output to the your_path directory.

See http://ant.apache.org/manual/Tasks/junit.html for additional information.

17. Concurrency Unification and Virtual JVM

TODO

18. GridGain CloudBoot - Dynamic VM Image Update

CloudBoot allows to optionally download and then re-download GridGain installation folder from two different URLs (using s3, ftp of file protocols) and start the node from the constructed this way installation directory.

Note CloudBoot feature is available only in Enterprise Edition.

18.1. Usage

To run CloudBoot you have to use cloudboot.{sh|bat} start script located in cloudboot/bin folder. It accepts following arguments:

Short form Long form Description Required Default

-m

--download-uri

URL to download GridGain from

Yes

-o

--override-uri

URL to override GridGain from

No

None

-g

--gridgain-home

GridGain home folder where to put

No

Taken from GRIDGAIN_HOME environment variable

-c

--start-script

Node start script path

No

bin/ggstart.{sh|bat}

-p

--start-params

Node start script parameters

No

None

-a

--s3-access-key

S3 access key ID

Yes, if S3 protocol is used

-s

--s3-secret-key

S3 secret access key

Yes, if S3 protocol is used

-u

--ftp-user FTP

FTP user name

Yes, if FTP server needs authentication

-w

--ftp-passwd

FTP password

Yes, if FTP server needs authentication

-f

--no-excludes

Download all files and folders (dy default javadoc and examples are excluded)

No

false

18.2. Examples

Below are examples of using cloudboot.{sh|bat} script with different protocols.

s3
cloudboot -m s3:gg-3.0-bucket -a <s3_key_id> -s <s3_secret_key> -g /custom/gg-home

Downloads GridGain from S3 bucket to custom location and starts node with default configuration.

ftp
cloudboot -m ftp://ftp.org/gg-3.0 -l <ftp_user> -u <ftp_passwd> -c /custom/gg.sh

Downloads GridGain from FTP server and starts node using custom start script.

file
cloudboot -m file:///path/gg-3.0 -p custom/gg.xml -f

Copies all GridGain files and folders including javadoc and examples and starts node with custom configuration.

19. Management and Monitoring

TODO

19.1. JMX Instrumentation

TODO

19.2. GridGain Visor - Scriptable Monitoring

19.2.1. Overview

Visor provides scriptable monitoring capabilities for GridGain. What emacs does for code editing - Visor does for GridGain monitoring.

Visor is a library that can be used as it is but it is most often used within Scala REPL as interactive monitoring environment. This is preferable way to use Visor for basic monitoring.

19.2.2. Usage

GridGain ships with GRIDGAIN_HOME/bin/ggvisor.{sh|bat} script that starts Scala REPL with automatically loaded Visor.

Another alternative is to load Visor manually. GridGain ships with GRIDGAIN_HOME/bin/ggscala.{sh|bat} script that starts Scala REPL (that should be on PATH) with GridGain on classpath. You can use this script to conveniently start Scala REPL with GridGain. Note that currently GridGain supports Scala 2.8 and higher only.

Once started you can pre-load Visor via the following command (assuming you are in GridGain installation folder):

:load bin/visor.scala

Script GRIDGAIN_HOME/bin/visor.scala contains Scala code that pre-loads Visor. You can modify this script freely to pre-load any other necessary code. By default, this script pre-load all necessary imports and runs visor status command. Note also that visor starts as a daemon node and therefore not visible in a normal topology.

Just type to get help and get started:

visor ?

19.2.3. Commands

Following commands are available in Visor:

Command Alias Description

ack

Acks arguments on all remote nodes.

alert

Email alerts for user-defined events.

cache

Prints cache statistics.

close

Disconnects visor from the grid.

config

Prints node configuration.

dash

Opens Visor UI dashboard.

deploy

Copies file or directory to remote host.

disco

Prints topology change log.

events

Print events from a node.

gc

Runs GC on remote nodes.

help

?

Prints visor help.

kill

Kills or restarts node.

license

Shows information about licenses and updates them.

log

Starts or stops grid-wide events logging.

mclear

Clears visor memory variables.

mget

Gets visor memory variable.

mlist

Prints visor memory variables.

node

Prints node statistics.

open

Connects visor to the grid.

ping

Pings node.

start

Starts or restarts nodes on remote hosts.

status

!

Prints visor status.

tasks

Prints tasks execution statistics.

top

Prints current topology.

vvm

Opens VisualVM for nodes in topology.

Type to get full information on cmd command:

visor ? "cmd"
open

open command connects visor to the grid.

Note P2P class loading should be enabled on all nodes.
Specification
visor open "{-cpath=<path>|-curl=<url>} {-g=<gridName>} {-dl}"
visor open "{-d} {-g=<gridName>} {-dl}"
visor open "{-e} {-g=<gridName>} {-dl}"
visor open
Arguments

Following arguments can be provided:

Argument Description Optional Default

cpath

Spring configuration path. Either -cpath or -curl can be specified - but not both.

Yes

Asked interactively

curl

Spring configuration URL. Either -cpath or -curl can be specified - but not both.

Yes

Asked interactively

g

Grid name.

Yes

Default grid

d

Flag forces the command to connect to the default grid without interactive mode.

Yes

e

Flag forces the command to connect to the existing grid without interactive mode. If there is no existing grid command will fail.

Yes

dl

Flag disables remote log collection.

Yes

Examples
  1. Prompts user to select XML Spring configuration file in interactive mode:

    visor open
  2. Connects visor using default XML configuration:

    visor open "-d"
  3. Connects visor to mygrid grid using default configuration:

    visor open "-g=mygrid"
  4. Connects visor to mygrid grid using configuration from provided Spring file:

    visor open "-cpath=/gg/config/mycfg.xml -g=mygrid"
close

close command disconnects visor from the grid.

Specification
visor close
Arguments

No arguments can be provided.

Examples

Disconnects visor from the grid:

visor close
status

status command prints visor status.

Specification
visor status {"-q"}
Arguments

Following arguments can be provided:

Argument Description Optional Default

q

Quite output without ASCII logo.

Yes

Examples
  1. Prints visor status:

    visor !
  2. Prints visor status in quiet mode:

    visor status "-q"
  3. Disconnected status:

    images/visor/disconnected.png

  4. Connected status:

    images/visor/connected.png

ack

ack command acks arguments on all remote nodes.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.ack.VisorAckCommand._

Note that VisorAckCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor ack {"s"}
visor ack ("s", f)
Arguments

Following arguments can be provided:

Argument Description Optional Default

s

String to print on each remote node.

Yes

Local node ID.

f

Scala predicate on ScalarRichNodePimp filtering nodes in the topology.

Yes

Examples
  1. Prints Howdy! on all nodes in the topology:

    visor ack "Howdy!"
  2. Prints Howdy! on all nodes satisfying this predicate:

    visor ack("Howdy!", _.id8.startsWith("123"))
  3. Prints local node ID on all nodes in the topology:

    visor ack
alert

alert command generates email alerts for user-defined events. Node events and grid-wide events are defined via mnemonics.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.alert.VisorAlertCommand._

Note that VisorAlertCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor alert
visor alert "-u {-id=<alert-id>|-a}"
visor alert "-r {-t=<sec>} -c1=e1<num> -c2=e2<num> ... -ck=ek<num>"
Arguments

Following arguments can be provided:

Argument Description Optional Default

u

Unregisters alert(s). Either -a flag or -id parameter is required. Note that only one of the -u or -r is allowed. If neither -u or -r provided - all alerts will be printed.

Yes

a

When provided with -u - all alerts will be unregistered.

Yes

id

When provided with -u - alert with matching ID will be unregistered.

Yes

r

Register new alert with mnemonic predicate(s). Note that only one of the -u or -r is allowed. If neither -u or -r provided - all alerts will be printed.

Yes

t

Defines notification frequency in seconds. This parameter can only appear with -r.

Yes

900 (15 minutes)

ck

Defines a mnemonic for the metric that will be measured.

Grid-wide metrics (not node specific):

  • -cc: total number of available CPUs in the grid.

  • -nc: total number of nodes in the grid.

  • -hc: total number of physical hosts in the grid.

  • -cl: current average CPU load (in %) in the grid.

Per-node current metrics:

  • -aj: active jobs on the node.

  • -cj: cancelled jobs on the node.

  • -tc: thread count on the node.

  • -it: idle time on the node.

    Note <num> can have s, m, or h suffix indicating seconds, minutes, and hours. By default (no suffix provided) value is assumed to be in milliseconds.
  • -ut: up time on the node.

    Note <num> can have s, m, or h suffix indicating seconds, minutes, and hours. By default (no suffix provided) value is assumed to be in milliseconds.
  • -je: job execute time on the node.

  • -jw: job wait time on the node.

  • -wj: waiting jobs count on the node.

  • -rj: rejected jobs count on the node.

  • -hu: heap memory used (in MB) on the node.

  • -cd: current CPU load on the node.

  • -hm: heap memory maximum (in MB) on the node.

Comparison part of the mnemonic predicate:

  • =eq<num>: equal = to <num> number.

  • =neq<num>: not equal != to <num> number.

  • =gt<num>: greater than > to <num> number.

  • =gte<num>: greater than or equal >= to <num> number.

  • =lt<num>: less than < to <num> number.

  • =lte<num>: less than or equal to <num> number.

Note Email notification will be sent for the alert only when all provided mnemonic predicates evaluate to true.

Yes

Examples
  1. Prints all currently registered alerts:

    visor alert

    Output:

    images/visor/alerts.png

  2. Unregisters all currently registered alerts:

    visor alert "-u -a"
  3. Unregisters alert with provided ID:

    visor alert "-u -id=12345678"
  4. Notifies every 10 min if grid has >= 4 CPUs and > 50% CPU load:

    visor alert "-r -t=600 -cc=gte4 -cl=gt50"
cache

cache command prints statistics about caches from specified node on the entire grid. Output sorting can be specified in arguments.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.cache.VisorCacheCommand._

Note that VisorCacheCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Output abbreviations:

#

Number of nodes.

H/h

Number of cache hits.

M/m

Number of cache misses.

R/r

Number of cache reads.

W/w

Number of cache writes.

Specification
visor cache
visor cache "-i {-n=<name>}"
visor cache "{-n=<name>} {-id=<node-id>|id8=<node-id8>} {-s=lr|lw|hi|mi|re|wr} {-a} {-r}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

id

Full ID of the node to get cache statistics from. Either -id8+or +-id can be specified. If neither is specified statistics will be gathered from all nodes.

Yes

id8

ID8 of the node to get cache statistics from. Either -id8+or +-id can be specified. If neither is specified statistics will be gathered from all nodes.

Yes

n

Name of the cache. By default - statistics for all caches will be printed.

Yes

s

Defines sorting type. Sorted by:

  • lr: last read.

  • lw: last write.

  • hi: hits.

  • mi: misses.

  • rd: reads.

  • wr: writes.

Yes

lr

i

Interactive mode. User can interactively select node for cache statistics.

Yes

r

Defines if sorting should be reversed. Can be specified only with -s argument.

Yes

Sorting is not reversed

a

Prints details statistics about each cache.

Yes

Only aggregated summary is printed.

Examples
  1. Prints summary statistics about caches from node with specified ID8 sorted by number of hits in reverse order:

    visor cache "-id8=12345678 -s=hi -r"
  2. Prints cache statistics for interactively selected node:

    visor cache "-i"
  3. Prints detailed statistics about all caches sorted by number of hits in reverse order:

    visor cache "-s=hi -r -a"
  4. Prints summary statistics about all caches:

    visor cache

    Output:

    images/visor/cache.png

  5. Prints detailed statistics about specified cache:

    visor cache "-n=partitioned -a"

    Output:

    images/visor/cache_all.png

config

config command prints node configuration.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.config.VisorConfigurationCommand._

Note that VisorConfigurationCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor config
visor config "{-id=<node-id>|id8=<node-id8>}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

id

Full node ID. Either -id8+or +-id can be specified. If neither is specified - command starts in interactive mode.

Yes

id8

Node ID8. Either -id8+or +-id can be specified. If neither is specified - command starts in interactive mode.

Yes

Examples
  1. Prints configuration for node with specified ID8:

    visor config "-id8=12345678"
  2. Starts command in interactive mode:

    visor config
dash

dash command opens UI Visor dashboard.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.dash.VisorDashboardCommand._

Note that VisorDashboardCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor dash
Arguments

No arguments can be provided.

Examples
  1. Opens UI Visor dashboard:

    visor dash
deploy

deploy command copies file or directory to remote host. Relies on SFTP protocol.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.deploy.VisorDeployCommand._

Note that VisorDeployCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor deploy "-h={<username>{:<password>}@}<host>{:<port>} {-u=<username>}
    {-p=<password>} {-k=<path>} -s=<path> {-d<path>}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

h

Host specification.

<host> can be a hostname, IP or range of IPs.

Example of range is 192.168.1.100~150, which means all IPs from 192.168.1.100 to 192.168.1.150 inclusively.

Default port number is 22.

This option can be provided multiple times.

No

u

Default username. Used if specification doesn’t contain username. If default is not provided as well, current local username will be used.

Yes

p

Default password. Used if specification doesn’t contain password. If default is not provided as well, it will be asked interactively.

Yes

k

Path to private key file. If provided, it will be used for all specifications that doesn’t contain password.

Yes

s

Source path.

No

d

Destination path (relative to GRIDGAIN_HOME).

Yes

Root of GRIDGAIN_HOME

Examples
  1. Copies file or directory to remote host (password authentication):

    visor deploy "-h=uname:passwd@host -s=/local/path -d=remote/path"
  2. Copies file or directory to remote host (private key authentication):

    visor deploy "-h=uname@host -k=ssh-key.pem -s=/local/path -d=remote/path"
disco

disco command prints topology change log as seen from the oldest node. Timeframe for querying events can be specified in arguments.

Note

This command depends on GridGain events.

GridGain events can be individually enabled and disabled and disabled events can affect the results produced by this command. Note also that configuration of Event Storage SPI that is responsible for temporary storage of generated events on each node can also affect the functionality of this command.

By default - all events are enabled and GridGain stores last 10,000 local events on each node. Both of these defaults can be changed in configuration.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.disco.VisorDiscoveryCommand._

Note that VisorDiscoveryCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor disco
visor disco "{-t=<num>s|m|h|d} {-r} {-c=<n>}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

t

Defines timeframe for querying events:

  • =<num>s: events fired during last <num> seconds.

  • =<num>m: events fired during last <num> minutes.

  • =<num>h: events fired during last <num> hours.

  • =<num>d: events fired during last <num> days.

Yes

All events

r

Defines whether sorting should be reversed.

Yes

Sorting is not reversed

c

Defines the maximum events count that can be shown.

Yes

All events

Examples
  1. Prints all discovery events sorted chronologically (oldest first):

    visor disco

    Output:

    images/visor/disco.png

  2. Prints all discovery events sorted chronologically in reversed order (newest first):

    visor disco "-r"

    Output:

    images/visor/disco_reversed.png

  3. Prints discovery events fired during last minute sorted chronologically:

    visor disco "-t=1m"

    Output:

    images/visor/disco_time.png

events

events command prints events from a node.

Note

This command depends on GridGain events.

GridGain events can be individually enabled and disabled and disabled events can affect the results produced by this command. Note also that configuration of Event Storage SPI that is responsible for temporary storage of generated events on each node can also affect the functionality of this command.

By default - all events are enabled and GridGain stores last 10,000 local events on each node. Both of these defaults can be changed in configuration.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.events.VisorEventsCommand._

Note that VisorEventsCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor events
visor events "{-id=<node-id>|-id8=<node-id8>} {-e=<ch,cp,de,di,jo,ta,cl,ca,sw>}
    {-t=<num>s|m|h|d} {-s=e|t} {-r} {-c=<n>}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

id

Full node ID. Either -id or -id8 can be specified. If called without the arguments - starts in interactive mode.

Yes

id8

Node ID8. Either -id or -id8 can be specified. If called without the arguments - starts in interactive mode.

Yes

e

Comma separated list of event types that should be queried:

  • ch: checkpoint events.

  • de: deployment events.

  • di: discovery events.

  • jo: job execution events.

  • ta: task execution events.

  • cl: cloud events.

  • ca: cache events.

  • cp: cache pre-loader events.

  • sw: swapspace events.

Yes

All events

t

Defines timeframe for querying events:

  • =<num>s: events fired during last <num> seconds.

  • =<num>m: events fired during last <num> minutes.

  • =<num>h: events fired during last <num> hours.

  • =<num>d: events fired during last <num> days.

Yes

All events

s

Defines sorting of queried events:

  • =e: sorted by event type.

  • =t: sorted chronologically.

Only one =e or =t can be specified.

Yes

r

Defines if sorting should be reversed. Can be specified only with -s argument.

Yes

Sorting is not reversed

c

Defines the maximum events count that can be shown. Values in summary tables are calculated over the whole list of events.

Yes

All events

Examples
  1. Queries all events from specified node:

    visor events "-id8=@n0"

    Output:

    images/visor/events.png

  2. Queries discovery events from specified node:

    visor events "-id8=@n0 -e=di"

    Output:

    images/visor/events_disco.png

  3. Starts command in interactive mode:

    visor events
gc

gc command runs garbage collector on remote nodes.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.gc.VisorGcCommand._

Note that VisorGcCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor gc
visor gc "{-id8=<node-id8>|-id=<node-id>} {-c}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

id

Full node ID. Either -id or -id8 can be specified.

Yes

id8

Node ID8. Either -id or -id8 can be specified.

Yes

c

Run DGC procedure on all caches.

Yes

Don’t run DGC

Examples
  1. Runs garbage collector on all nodes in topology:

    visor gc
  2. Runs garbage collector on specified node:

    visor gc "-id8=12345678"
  3. Runs garbage collector and DGC procedure on all caches:

    visor gc "-id8=12345678 -c"
kill

kill command kills or restarts node.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.kill.VisorKillCommand._

Note that VisorKillCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor kill
visor kill "-in|-ih"
visor kill "{-r|-k} {-id8=<node-id8>|-id=<node-id>}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

in

Run command in interactive mode with ability to choose a node to kill or restart. Note that either -in or -ih can be specified. This mode is used by default.

Yes

ih

Run command in interactive mode with ability to choose a host where to kill or restart nodes. Note that either -in or -ih can be specified.

Yes

r

Restart node mode. Note that either -r or -k can be specified. If none provided - command starts in interactive mode.

Yes

k

Kill (stop) node mode. Note that either -r or -k can be specified. If none provided - command starts in interactive mode.

Yes

id8

ID8 of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode.

Yes

id

ID of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode.

Yes

Examples
  1. Starts command in interactive mode:

    visor kill
  2. Restarts node with specified ID8:

    visor kill "-id8=12345678 -r"
  3. Kills (stops) all nodes:

    visor kill "-k"
license

license command shows information about all licenses that are used on the grid. Also can be used to update one of the licenses.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.license.VisorLicenseCommand._

Note that VisorLicenseCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor license
visor license "-f=<path> -id=<license-id>"
Arguments

Following arguments can be provided:

Argument Description Optional Default

f

Path to new license XML file.

Yes

id

ID of the license will be updated.

Yes

Examples
  1. Shows all licenses that are used on the grid:

    visor license

    Output:

    images/visor/license.png

  2. Copies new license file to all nodes that use license with provided ID:

    visor license "-f=/path/to/new/license.xml
        -id=fbdea781-90e6-4d1b-b8b3-5b8c14aa2df7"
log

log command starts or stops grid-wide events logging of discovery and failure grid-wide events. Logging starts by default when Visor starts.

Events are logged to a file. If path is not provided, it will log into GRIDGAIN_HOME/work/visor/visor-log.

File is always opened in append mode. If file doesn’t exist, it will be created.

It is often convenient to tail -f the log file in a separate console window.

Log command prints periodic topology snapshots in the following format:

H/N/C |1   |1   |4   |=^========..........|
where:
   H - Hosts
   N - Nodes
   C - CPUs
   = - 5%-based marker of average CPU load across the topology
   ^ - 5%-based marker of average heap memory used across the topology
Specification
visor log
visor log "-l {-f=<path>} {-p=<num>} {-t=<num>}"
visor log "-s"
Arguments

Following arguments can be provided:

Argument Description Optional Default

l

Starts logging. If logging is already started - it’s no-op.

Yes

f

Provides path to the file. Path can be absolute or relative to GRIDGAIN_HOME.

Yes

GRIDGAIN_HOME/work/visor/visor-log

p

Provides period of quering events (in seconds).

Yes

10

t

Provides period of logging topology snapshot (in seconds).

Yes

20

s

Stops logging. If logging is already stopped - it’s no-op.

Yes

Examples
  1. Prints log status:

    visor log
  2. Starts logging to file located at /home/user/visor-log:

    visor log "-l -f=/home/user/visor-log"
  3. Starts logging to file located at GRIDGAIN_HOME/log/visor-log:

    visor log "-l -f=log/visor-log"
  4. Starts logging with querying events period of 20 seconds:

    visor log "-l -p=20"
  5. Starts logging with topology snapshot logging period of 30 seconds:

    visor log "-l -t=30"
  6. Stops logging:

    visor log "-s"
mclear

mclear command clears visor memory variables.

Specification
visor mclear
visor mclear "<name>|-ev|-al|-ca|-no|-tn|-ex"
Arguments

Following arguments can be provided:

Argument Description Optional Default

<name>

Variable name to clear. Note that name doesn’t include @ symbol used to reference variable.

Yes

ev

Clears all event variables.

Yes

al

Clears all alert variables.

Yes

ca

Clears all cache variables.

Yes

no

Clears all node variables.

Yes

tn

Clears all task name variables.

Yes

ex

Clears all task execution variables.

Yes

Examples
  1. Clears all visor variables:

    visor mclear
  2. Clears all visor cache variables:

    visor mclear "-ca"
  3. Clears n2 visor variable:

    visor mclear "n2"
mget

mget command gets visor memory variable. Variable can be referenced with @ prefix.

Specification
visor mget "n"
Arguments

Following arguments can be provided:

Argument Description Optional Default

<name>

Variable name.

Yes

Examples
  1. Gets visor variable var:

    visor mget "var"
  2. Gets visor variable whose name is referenced by variable v:

    visor mget "@v"
mlist

mlist command prints visor memory variables.

Specification
visor mlist {"arg"}
Arguments

Following arguments can be provided:

Argument Description Optional Default

arg

String that contains start characters of variable names.

Yes

Examples
  1. Prints out all visor memory variables:

    visor mlist

    Output:

    images/visor/mlist.png

  2. Lists variables that start with n from visor memory:

    visor mlist "n"

    Output:

    images/visor/mlist_n.png

node

node command prints node statistics.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.node.VisorNodeCommand._

Note that VisorNodeCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor node "{id8=<node-id8>|id=<node-id>} {-a}"
visor node
Arguments

Following arguments can be provided:

Argument Description Optional Default

id8

ID8 of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode.

Yes

id

ID of the node to kill or restart. Note that either -id8 or -id can be specified. If none provided - command starts in interactive mode.

Yes

a

Print extended information.

Yes

Only abbreviated statistics is printed.

Examples
  1. Starts command in interactive mode:

    visor node
  2. Prints statistics for specified node:

    visor node "-id8=c0023e3e"

    Output:

    images/visor/node.png

  3. Prints full statistics for specified node:

    visor node "-id8=c0023e3e -a"

    Output:

    images/visor/node_all.png

ping

ping command pings node.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.ping.VisorPingCommand._

Note that VisorPingCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor ping {"id81 id82 ... id8k"}
Arguments

Following arguments can be provided:

Argument Description Optional Default

id8k

ID8 of the node to ping.

Yes

All nodes are pinged.

Examples
  1. Pings node with specified ID8:

    visor ping "@n0"

    Output:

    images/visor/ping.png

  2. Pings all nodes in the topology:

    visor ping

    Output:

    images/visor/ping_all.png

start

start command starts one or more nodes on remote host(s). Uses SSH protocol to execute commands.

Note

SSH remote execution requires that all environment properties be set globally on the remote node. Standard GridGain ggstart.{sh|bat} script needs both GRIDGAIN_HOME and JAVA_HOME environment variables set globally for SSH-based execution to work.

On Linux - you can use /etc/environment file to set global environment variables at the login time. Mac OSX currently doesn’t support automatic setting of global variable and you need to provide custom start script in this case. On Windows use standard way to set environment properti

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.start.VisorStartCommand._

Note that VisorStartCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor start "-f=<path> {-u=<username>} {-p=<password>} {-k=<path>} {-n=<num>}
    {-s=<path>} {-c=<path>} {-m=<num>} {-r} {-l=<path>}"
visor start "-h={<username>{:<password>}@}<host>{:<port>}{#<num>}
    {-u=<username>} {-p=<password>} {-k=<path>} {-n=<num>} {-s=<path>}
    {-c=<path>} {-m=<num>} {-r} {-l=<path>}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

f

Path to file that contains topology specification. Each line of the file represents one host. Format is the following: {<uname>{:<passwd>}@}<host>{:<port>}{<number of nodes>}. Lines starting with will be ignored.

<host> can be a hostname, IP or range of IPs. Example of range is 192.168.1.100~150, which means all IPs from 192.168.1.100 to 192.168.1.150 inclusively.

Default port is 22. Default number of nodes is 1.

Yes

h

Topology specification for one host.

<host> can be a hostname, IP or range of IPs. Example of range is 192.168.1.100~150, which means all IPs from 192.168.1.100 to 192.168.1.150 inclusively.

Default port number is 22. Default number of nodes is 1.

This option can be provided multiple times.

If used with -f, it will override specifications taken from file.

Yes

u

Default username. Used if specification doesn’t contain username.

Yes

Current local username

p

Default password. Used if specification doesn’t contain password.

Yes

Asked interactively

k

Path to private key file. If provided, it will be used for all specifications that doesn’t contain password.

Yes

n

Default number of starting nodes. Used if specification doesn’t contain number of nodes.

Yes

1

g

GridGain home path.

Yes

Taken from GRIDGAIN_HOME environment variable.

s

Path to start script (relative to GridGain home).

Yes

Default is bin/ggstart.sh" for Unix or bin\ggstart.bat for Windows.

c

Path to configuration file.

Yes

Default GridGain configuration

m

Maximum number of nodes that can be started in parallel on one host.

Yes

5

r

Indicates that existing nodes on the host will be restarted.

Yes

Don’t restart

l

Prefix for the log file path (relative to GridGain home). Each node will write log in separate file, appending node number to provided path.

Yes

work/log/gridgain.log

Examples
  1. Starts three nodes with default configuration (password authentication):

    visor start "-h=uname:passwd@10.1.1.10#3"
  2. Starts 3 nodes on 5 hosts with default configuration (key-based authentication):

    visor start "-h=uname@192.168.1.100~104#3 -k=ssh-key.pem"
  3. Reads hosts.txt file and starts nodes with provided configuration:

    visor start "-f=hosts.txt -c=config/spring.xml"
tasks

tasks command prints statistics about tasks and executions.

Note

This command depends on GridGain events.

GridGain events can be individually enabled and disabled and disabled events can affect the results produced by this command. Note also that configuration of Event Storage SPI that is responsible for temporary storage of generated events on each node can also affect the functionality of this command.

By default - all events are enabled and GridGain stores last 10,000 local events on each node. Both of these defaults can be changed in configuration.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.tasks.VisorTasksCommand._

Note that VisorTasksCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor tasks
visor tasks "-l {-t=<num>s|m|h|d} {-r}"
visor tasks "-s=<substring> {-t=<num>s|m|h|d} {-r}"
visor tasks "-g {-t=<num>s|m|h|d} {-r}"
visor tasks "-h {-t=<num>s|m|h|d} {-r}"
visor tasks "-n=<task-name> {-r}"
visor tasks "-e=<exec-id>"
Arguments

Following arguments can be provided:

Argument Description Optional Default

l

List all tasks and executions. Executions sorted chronologically (see -r), and tasks alphabetically. This is a default mode when command is called without parameters.

Yes

s

List all tasks and executions for a given task name substring. Executions sorted chronologically (see -r), and tasks alphabetically.

Yes

g

List all tasks grouped by nodes for a given time period. Tasks sorted alphabetically.

Yes

h

List all tasks grouped by hosts for a given time period. Tasks sorted alphabetically.

Yes

t

Defines time frame:

  • =<num>s: Last <num> seconds.

  • =<num>m: Last <num> minutes.

  • =<num>h: Last <num> hours.

  • =<num>d: Last <num> days.

Yes

1 hour

r

Reverse sorting of executions.

Yes

Sorting is not reversed.

n

Defines task name to print aggregated statistic.

Yes

e

Defines execution ID to print aggregated statistic.

Yes

Examples
  1. Prints list of all tasks and executions for the last hour (default):

    visor tasks

    Output:

    images/visor/tasks.png

  2. Prints list of tasks and executions that started during last 2 minutes:

    visor tasks "-l -t=2m"

    Output:

    images/visor/tasks_2m.png

  3. Prints list of all tasks and executions that have HelloWorld in task name.

    visor tasks "-s=HelloWorld"

    Output:

    images/visor/tasks_HelloWorld.png

  4. Prints list of tasks grouped by nodes:

    visor tasks "-g"

    Output:

    images/visor/tasks_nodes.png

  5. Prints list of tasks that started during last 6 minutes grouped by nodes:

    visor tasks "-g -t=6m"

    Output:

    images/visor/tasks_nodes_6m.png

  6. Prints summary for task named GridTask:

    visor tasks "-n=GridTask"

    Output:

    images/visor/tasks_name.png

  7. Traces task execution with ID taken from e1 memory variable:

    visor tasks "-e=@e1"

    Output:

    images/visor/tasks_exec.png

top

top command prints current topology.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.top.VisorTopologyCommand._

Note that VisorTopologyCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor top "{-c1=e1<num> -c2=e2<num> ... -ck=ek<num>}
{-h=<host1> ... -h=<hostk>} {-a}"
Arguments

Following arguments can be provided:

Argument Description Optional Default

ck

Defines a mnemonic for node filter:

  • -cc: Number of available CPUs on the node.

  • -cl: Average CPU load (in %) on the node.

  • -aj: Active jobs on the node.

  • -cj: Cancelled jobs on the node.

  • -tc: Thread count on the node.

  • -it: Idle time on the node.

    Note <num> can have s, m, or h suffix indicating seconds, minutes, and hours. By default (no suffix provided) value is assumed to be in milliseconds.
  • -ut: Up time on the node.

    Note <num> can have s, m, or h suffix indicating seconds, minutes, and hours. By default (no suffix provided) value is assumed to be in milliseconds.
  • -je: Job execute time on the node.

  • -jw: Job wait time on the node.

  • -wj: Waiting jobs count on the node.

  • -rj: Rejected jobs count on the node.

  • -hu: Heap memory used (in MB) on the node.

  • -hm: Heap memory maximum (in MB) on the node.

Comparison part of the mnemonic predicate:

  • =eq<num>: Equal = to <num> number.

  • =neq<num>: Not equal != to <num> number.

  • =gt<num>: Greater than > to <num> number.

  • =gte<num>: Greater than or equal >= to <num> number.

  • =lt<num>: Less than < to <num> number.

  • =lte<num>: Less than or equal to <num> number.

Yes

h

This defines a host to show nodes from. Can be provided multiple times.

Yes

a

Defines whether to show a separate table of nodes with detail per-node information.

Yes

Examples
  1. Prints topology for all nodes with CPU load greater than 20%:

    visor top "-cl=gt20"

    Output:

    images/visor/top_load.png

  2. Prints full information for all nodes with CPU load less than 20%:

    visor top "-cl=lt20 -a"

    Output:

    images/visor/top_load_all.png

  3. Prints topology for provided host:

    visor top "-h=192.168.1.100"

    Output:

    images/visor/top_host.png

  4. Prints full topology:

    visor top

    Output:

    images/visor/top.png

vvm

vvm command opens VisualVM.

When using this command from Scala code (not from REPL) you need to make sure to properly import all necessary typed and implicit conversions:

import org.gridgain.visor._
import commands.vvm.VisorVvmCommand._

Note that VisorVvmCommand object contains necessary implicit conversions so that this command would be available via visor keyword.

Specification
visor vvm "{-home=dir} {-id8=<node-id8>} {-id=<node-id>}"
Arguments:

Following arguments can be provided:

Argument Description Optional Default

home

VisualVM home directory.

Yes

PATH and JAVA_HOME are searched

id8

ID8 of node. Either -id8 or -id can be specified.

Yes

id

Full ID of node. Either -id8 or -id can be specified.

Yes

Examples
  1. Opens VisualVM connected to JVM for node with specified ID8:

    visor vvm "-id8=12345678"
  2. Opens VisualVM connected to JVM for node with given full node ID:

    visor vvm "-id=5B923966-85ED-4C90-A14C-96068470E94D"
  3. Opens VisualVM installed in C:\VisualVM directory for specified node:

    visor vvm "-home=C:\VisualVM -id8=12345678"
  4. Opens VisualVM connected to all nodes:

    visor vvm

20. Appendix A - SPIs

20.1. Discovery SPI

20.1.1. Overview

GridDiscoverySpidoc provides a mechanism in grid by which every node can discovery other nodes on the grid.

To discover remote nodes and get remote node attributes, the following public methods are available:

  • Grid.localNode()doc

  • Grid.pingNode(UUID)doc

  • GridProjection.node(UUID, GridPredicate…) doc

  • GridProjection.nodes(GridPredicate…)doc

  • GridProjection.remoteNodes(GridPredicate…) doc

and others.

20.1.2. Built-in Implementations

GridMulticastDiscoverySpi

Discovery SPI implementation using IP-multicast.

Configuration

The following configuration parameters can be used to configure GridMulticastDiscoverySpidoc:

Setter Method Description Optional Default

setMulticastGroup(String) doc

Multicast IP address.

Yes

228.1.2.4

setMulticastPort(int)doc

Port number which multicast messages are sent to.

Yes

47200

setTcpPort(int)doc

Local port number that is used by discovery SPI.

Yes

47300

setHeartbeatFrequency(long) doc

Delay in milliseconds between heartbeat requests. SPI sends Multicast messages in configurable time interval to other nodes to notify them about its state.

Yes

3000

setMaxMissedHeartbeats(int) doc

Number of heartbeat requests that could be missed before remote node is considered to be failed.

Yes

3

setLeaveAttempts(int)doc

Number of attempts to notify another nodes that this one is leaving grid. Multiple leave requests are sent to increase the chance of successful delivery to every node, since IP Multicast protocol is unreliable. Note that on most networks loss of IP Multicast packets is generally negligible.

Yes

3

setLocalAddress(String) doc

Local host IP address that discovery SPI uses.

Yes

Local host address. Preference will be given to none-loopback address if one can be detected. Otherwise, loopback address will be assigned.

setTimeToLive(int)doc

Multicast messages time-to-live in router hops.

Yes

8

setLocalPortRange(int)doc

Local port range for TCP and Multicast ports (value must greater than or equal to 0). If provided local port (see GridMulticastDiscoverySpi.setMulticastPort(int) doc or GridMulticastDiscoverySpi.setTcpPort(int) doc is occupied, implementation will try to increment the port number for as long as it is less than initial value plus this range. If port range value is 0, then implementation will try bind only to the port provided by GridMulticastDiscoverySpi.setMulticastPort(int) doc or GridMulticastDiscoverySpi.setTcpPort(int) doc methods and fail if binding to these ports did not succeed. Local port range is very useful during development when more than one grid nodes need to run on the same physical machine.

Yes

10

Note
IP-multicast
IP-multicast should be enabled for this SPI to function properly. We advise you to Google search for Java IP-multicast troubleshooting on the internet as many IP-multicast intensive systems have good configuration and optimization documentations.
Examples

GridMulticastDiscoverySpidoc is used by default and should be explicitely configured only if some SPI configuration parameters need to be overriden. Examples below insert own multicast group value that differs from default 228.1.2.4.

From code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GridMulticastDiscoverySpi spi = new GridMulticastDiscoverySpi();

// Put another multicast group.
spi.setMulticastGroup("228.10.10.157");

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default discovery SPI.
cfg.setDiscoverySpi(spi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

1
2
3
4
5
6
7
8
9
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="discoverySpi">
        <bean class="org.gridgain.grid.spi.discovery.multicast.GridMulticastDiscoverySpi">
            <property name="multicastGroup" value="228.10.10.157"/>
        </bean>
    </property>
    ...
</bean>
GridTcpDiscoverySpi
Note

Please note, that in release 3.0.5 and before this SPI was named as org.gridgain.grid.spi.discovery.tcplite.GridTcpLiteDiscoverySpi.

This Discovery preserves order of nodes added, this means that if two nodes A and B are added to the topology in some order, then other nodes in the topology get NODE_JOINED event in the same order exactly. All nodes in topology are organized in a ring. Topology has coordinator node (this is an ordinary node, no extra-configuration required) that is responsible for issuing heartbeat messages, adding new nodes to topology and for cleaning IP finder (in case it is shared) and metrics store (if one is used). Coordinator may leave or fail, in this case one of the rest nodes will take the role.

Configuration

The following configuration parameters can be used to configure GridTcpDiscoverySpi doc:

Setter Method Description Optional Default

setIpFinder(GridTcpDiscoveryIpFinder) doc

IP finder that is used to share info about nodes IP addresses.

No

No value

Provided implementations can be used:

  • GridTcpDiscoverySharedFsIpFinderdoc

  • GridTcpDiscoveryS3IpFinderdoc

  • GridTcpDiscoveryJdbcIpFinderdoc

  • GridTcpDiscoveryVmIpFinderdoc

setMetricsStore(GridTcpDiscoveryMetricsStore) doc

When metrics store is provided metrics are not sent via heartbeat messages, they are stored in the store and are requested by nodes on demand. Each node updates its metrics in the store once a heartbeat period. Under certain conditions using of the metrics store may save network bandwidth.

Yes

No value. If not provided metrics are updated via heartbeat messages.

Provided implementations can be used (for configuration details refer to Javadocs):

  • GridTcpDiscoverySharedFsMetricsStore doc

  • GridTcpDiscoveryS3MetricsStoredoc

  • GridTcpDiscoveryJdbcMetricsStoredoc

  • GridTcpDiscoveryVmMetricsStoredoc

setLocalAddress(String)doc

Sets local host IP address that discovery SPI uses.

Yes

If not provided, by default a first found non-loopback address will be used. If there is no non-loopback address available, then java.net.InetAddress.getLocalHost() will be used.

setLocalPort(int)doc

Port the SPI listens to.

Yes

47500

setLocalPortRange(int)doc

Local port range. Local node will try to bind on first available port starting from local port up until local port + local port range.

Yes

100

setHeartbeatFrequency(long)doc

Delay in milliseconds between heartbeat issuing of heartbeat messages. SPI sends messages in configurable time interval to other nodes to notify them about its state.

Yes

2000

setMaxMissedHeartbeats(int)doc

Number of heartbeat requests that could be missed before local node initiates status check.

Yes

1

setReconnectCount(int)doc

Number of times node tries to (re)establish connection to another node.

Yes

2

setNetworkTimeout(long)doc

Sets maximum network timeout in milliseconds to use for network operations.

Yes

5000

setSocketTimeout(long)doc

Sets socket operations timeout. This timeout is used to limit connection time and write-to-socket time.

Yes

2000

setAckTimeout(long)doc

Sets timeout for receiving acknowledgement for sent message. If acknowledgement is not received within this timeout, sending is considered as failed and SPI tries to repeat message sending.

Yes

2000

setJoinTimeout(long)doc

Sets join timeout. If non-shared IP finder is used and node fails to connect to any address from IP finder, node keeps trying to join within this timeout. If all addresses are still unresponsive, exception is thrown and node startup fails. 0 means wait forever.

Yes

0

setThreadPriority(int)doc

Thread priority for threads started by SPI.

Yes

7

setStoresCleanFrequency(int)doc

IP finder and Metrics Store clean frequency in milliseconds. Coordinator will clean IP finder and metrics store once a period.

Yes

60000

setStatisticsPrintFrequency(int) doc

Statistics print frequency in milliseconds. 0 indicates that no print is required. If value is greater than 0 and log is not quiet then stats are printed out with INFO level once a period. This may be very helpful for tracing topology problems.

Yes

0

setFastForwardFailureDetection(boolean) doc

Sets fast forward failure detection flag. If this flag is set to true and connection to some node times out, then the host will be considered unreachable and all other nodes on the same host will be considered failed. If multiple nodes are launched on the same machine, setting this property to true increases failure detection speed in case network goes down on that host.

Yes

true

Using Metrics Store

Using of metrics store can increase network performance (especially in large topologies) and save network bandwidth since metrics are not sent via heartbeat messages, they are stored in the store and are requested by nodes on demand. Each node updates its metrics in the store once a heartbeat period.

Without metrics store heartbeat message may grow a bit too much to be quickly and efficiently transferred across all nodes in topology, so for better performance we recommend using metrics store.

Provided implementations can be used (for configuration details refer to Javadocs):

  • GridTcpDiscoverySharedFsMetricsStore doc

  • GridTcpDiscoveryS3MetricsStore doc

  • GridTcpDiscoveryJdbcMetricsStore doc

  • GridTcpDiscoveryVmMetricsStore doc

Using TCP Discovery With Large Topologies

When you are going to launch significant amount of nodes (100 and more) in your grid, it is recommended to configure SPI using metrics store.

Please refer to the following table for SPI configuration:

Property Value

networkTimeout

30000

heartbeatFrequency

10000

maxMissedHeartbeats

3

Examples

GridTcpDiscoverySpidoc can be configured directly from code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
GridTcpDiscoverySpi spi = new GridTcpDiscoverySpi();

GridTcpDiscoveryVmIpFinder ipFinder = new GridTcpDiscoveryVmIpFinder();

ipFinder.setAddresses(Arrays.asList("127.0.0.1", "1.2.3.4:47520"));

// IP finder is required.
spi.setIpFinder(ipFinder);

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default discovery SPI.
cfg.setDiscoverySpi(spi);

// Start grid.
GridFactory.start(cfg);

from Spring configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="discoverySpi">
        <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
            <property name="ipFinder">
                <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.GridTcpDiscoveryVmIpFinder">
                    <property name="addresses">
                        <list>
                            <value>1.2.3.4:47500</value>
                        </list>
                    </property>
                    <property name="segmentCheckAddrs">
                        <list>
                            <bean class="java.net.InetAddress" factory-method="getByName">
                                <constructor-arg value="2.3.4.5"/>
                            </bean>
                        </list>
                    </property>
                </bean>
            </property>
        </bean>
    </property>
    ...
</bean>

or from Spring config file using JSON configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="discoverySpi">
        <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
            <property name="json" value="{heartbeatFrequency: 5000; networkTimeout: 4000;
                ipFinder: {addresses: ['1.2.3.4:47500'];
                @class:'org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.GridTcpDiscoveryVmIpFinder'+"/>
        </bean>
    </property>
    ...
</bean>
Default Implementation

If no discovery SPI is provided in configuration by default GridMulticastDiscoverySpi is used.

20.1.3. Configuration

GridDiscoverySpi is provided in grid configuration passed into GridFactorydoc at startup. You can configure discovery SPI implementation as follows:

1
2
3
4
5
6
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Configure grid to use multicast discovery layer.
cfg.setDiscoverySpi(new GridMulticastDiscoverySpi());

GridFactory.start(cfg);

Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.

20.2. Communication SPI

20.2.1. Overview

GridCommunicationSpidoc enables communication between different nodes within grid. It provides basic plumbing to send and receive grid messages and is utilized for all distributed grid operations, such as task execution, monitoring data exchange, distributed even querying and others.

To send and receive messages from public API, the following public methods are available:

  • GridProjection.send(Object, GridPredicate…) doc

  • GridProjection.send(Collection, GridPredicate…) doc

  • GridProjection.listen(GridPredicate2…) doc

  • GridProjection.remoteListenAsync(Collection, GridPredicate2…) doc

  • GridProjection.remoteListenAsync(GridNode, GridPredicate2…) doc

  • GridProjection.remoteListenAsync(GridPredicate, GridPredicate2…) doc

Note Note that messages can be received asynchronously by registering listeners with listen(…) and remoteListenAsync(…) methods. GridGain also provides convenient actor-based adapter for them: GridListenActordoc.

20.2.2. Built-in Implementations

Gridgain comes with following communication SPI’s supported out of the box.

GridTcpCommunicationSpi

GridTcpCommunicationSpidoc is default communication SPI which uses TCP/IP protocol to communicate with other nodes.

To enable communication with other nodes, this SPI adds GridTcpCommuncationSpi.ATTR_ADDR doc and GridTcpCommuncationSpi.ATTR_PORT doc local node attributes.

At startup, this SPI tries to start listening to local port specified by GridTcpCommuncationSpi.setLocalPort(int) doc method. If local port is occupied, then SPI will automatically increment the port number until it can successfully bind for listening. GridTcpCommuncationSpi.setLocalPortRange(int) doc configuration parameter controls maximum number of ports that SPI will try before it fails. Port range comes very handy when starting multiple grid nodes on the same machine or even in the same VM. In this case all nodes can be brought up without a single change in configuration.

Configuration

The following configuration parameters can be used to configure GridTcpCommuncationSpi:

Setter Method Description Optional Default

setLocalAddress(String) doc

Sets local host address for socket binding.

Yes

Any available local IP address.

setLocalPort(int)doc

Sets local port for socket binding.

Yes

47100 (specified in GridTcpCommunicationSpi.DFLT_PORT doc)

setLocalPortRange(int)doc

Controls maximum number of local ports tried if all previously tried ports are occupied.

Yes

100 (specified in GridTcpCommunicationSpi.DFLT_PORT_RANGE doc)

setTcpNoDelay(boolean)doc

Sets value for TCP_NODELAY socket option. Each socket will be opened using provided value.

Yes

true (specified in GridTcpCommunicationSpi.DFLT_TCP_NODELAY doc)

setConnectTimeout(long)doc

Sets connect timeout used when establishing connection with remote nodes.

Yes

1000 (specified in GridTcpCommunicationSpi.DFLT_CONN_TIMEOUT doc)

setIdleConnectionTimeout(long)doc

Sets maximum idle connection timeout upon which a connection to client will be closed.

Yes

30000 (specified in GridTcpCommunicationSpi.DFLT_IDLE_CONN_TIMEOUT doc)

setMaxOpenClients(int)doc

Sets the maximum count of simultaneously open clients per remote node.

Yes

1 (specified in GridTcpCommunicationSpi.DFLT_MAX_OPEN_CLIENTS doc)

setSelectorsCount(int)doc

Sets the count of selectors te be used in TCP server.

Yes

Default count of selectors equals to the count of processors in system. (specified in GridTcpCommunicationSpi.DFLT_SELECTORS_CNTdoc)

setDirectBuffer(boolean)doc

Switches between using NIO direct and NIO heap allocation buffers. Although direct buffers perform better, in some cases (especially on Windows) they may cause JVM crashes. If that happens in your environment, set this property to false.

Yes

true

setMessageThreads(int)doc

Number of threads responsible for receiving messages.

Yes

20(specified in GridTcpCommunicationSpi.DFLT_MSG_THREADS doc)

Examples

GridTcpCommunicationSpi is used by default and should be explicitly configured only if some SPI configuration parameters need to be overridden.

From code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GridTcpCommunicationSpi commSpi = new GridTcpCommunicationSpi();

// Override local port.
commSpi.setLocalPort(4321);

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default communication SPI.
cfg.setCommunicationSpi(commSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="communicationSpi">
        <bean class="org.gridgain.grid.spi.communication.tcp.GridTcpCommunicationSpi">
            <!-- Override local port. -->
            <property name="localPort" value="4321"/>
        </bean>
    </property>
    ...
</bean>
Default Implementation

If no communication SPI is provided in configuration by default GridTcpCommunicationSpi is used.

20.2.3. Usage

Here is an example of how to send and receive messages using public Griddoc API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Grid grid = GridFactory.getGrid();

grid.addMessageListener(new GridMessageListener() {
    /**
     * @see GridMessageListener#onMessage(UUID,Serializable)
     */
    public void onMessage(UUID nodeId, Serializable msg) {
        System.out.println("Received message: " + msg);
    }
));

// Send message to itself.
grid.sendMessage(grid.getLocalNode(), "TEST");

20.2.4. Configuration

GridCommunicationSpi is provided in Grid configuration and passed into GridFactorydoc at startup. By default GridTcpCommunicationSpi is used. You can configure a different communication SPI implementation as follows:

1
2
3
4
5
6
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Configure grid to use TCP communication layer.
cfg.setCommunicationSpi(new GridTcpCommunicationSpi());

GridFactory.start(cfg);

Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.

20.3. Collision SPI

20.3.1. Overview

GridCollisionSpidoc SPI allows to regulate how grid jobs get executed when they arrive on a destination node for execution. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation:

  1. All jobs can proceed in parallel.

  2. Jobs can be sequenced i.e., only one job can execute in any given point of time.

  3. Only certain number or types of grid jobs can proceed in parallel.

  4. Job may proceed based on some time based events.

Every time a new job arrives, it gets placed on waiting queue and it is up to collision SPI to either reject or activate an waiting job, or cancel an active job, or do nothing. Generally, collision SPI gets invoked in the following cases:

  • A new job has arrived.

  • An existing job has finished.

  • A node metrics update has been received.

  • Collision SPI implementation has called GridCollisionExternalListener.onExternalCollision() doc to force collision resolution.

To summarize, collision SPI provides developer with ability to use custom logic in determining how grid jobs should be scheduled and executed on a destination grid node.

Tip Note that collision SPI only controls job execution - it does not control task execution. So if you have a case where a node only emits tasks, but does not execute jobs, then collision SPI will never be invoked on that node.
Job Rejection

If job is canceled while waiting for execution (job is on waiting list and execution has not started yet), then it will be rejected and GridJobResultdoc which is passed into GridTask.result(GridJobResult, List<GridJobResult>) doc method will contain GridExecutionRejectedExceptiondoc. You can access this exception by calling GridJobResult.getException()doc method.

Tip
Automatic Failover
Note, that if you use any of GridTask adapters, rejected jobs will be automatically failed over to another node. By default, jobs get automatically failed over only in case of job rejection or a node failure).
Job Cancellation

If job is canceled after it already was scheduled to execute, then GridJob.cancel()doc method will be called on it (this method will also call Thread.interrupt() on the executing thread automatically). In this case cancellation is used as a notification to a job that it should stop executing. Just like with Java thread interruption, it is ultimately up to a job to finish executing and return result to caller. Your GridTask.result(GridJobResult, List<GridJobResult>) http://www.gridgain.com/javadoc40E/org/gridgain/grid/GridTask.html#result(org.gridgain.grid.GridJobResult, java.util.List)[doc] implementation should decide if job result is acceptable and whether the job should be failed over to another node or not.

GridCollisionExternalListener

This listener is set on collision SPI for notification of external collision events (e.g. job stealing). Once grid receives such notification, it will immediately invoke collision resolution.

GridGain uses this listener to enable job stealing from overloaded to underloaded nodes in GridJobStealingCollisionSpi. However, you can also utilize it, for instance, to provide time based collision resolution. To achieve this, you most likely would mark some job by setting a certain attribute in job context for a job that requires time-based scheduling and set some timer in your collision SPI implementation that would wake up after a certain period of time. Once this period is reached, you would notify this listener that a collision resolution should take place. Then inside of your collision resolution logic, you would find the marked waiting job and activate it.

Note that most collision SPI’s may not have external or time-based collisions. In that case, they should simply ignore this method and do nothing when listener is set.

20.3.2. Built-in Implementations

GridFifoQueueCollisionSpi

GridFifoQueueCollisionSpidoc allows a certain number of jobs in first-in first-out order to proceed without interruptions. All other jobs will be put on waiting list until their turn.

Note that if parallelJobsNumber doc configuration parameter is not set, then this SPI will allow all concurrent jobs to proceed without interruptions. Make sure to set parallelJobsNumber doc parameter to enforce an upper limit for a maximum number of concurrent jobs that can proceed without interruptions. For example, to have only one job proceed at a time, set parallelJobsNumber parameter to 1.

Configuration

The following configuration parameters can be used to configure GridFifoQueueCollisionSpi doc:

Setter Method Description Optional Default

setParallelJobsNumber(int) doc

Sets upper limit for a number of jobs that will proceed without interruptions.

Yes

95

Examples

As any GridGain SPI, GridFifoQueueCollisionSpidoc SPI can be configured either directly from code or from Spring configuration file. Here is an example of GridFifoQueueCollisionSpidoc SPI configuration from code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GridFifoQueueCollisionSpi colSpi = new GridFifoQueueCollisionSpi();

// Execute all jobs sequentially by setting parallel job number to 1.
colSpi.setParallelJobsNumber(1);

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default collision SPI.
cfg.setCollisionSpi(colSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

1
2
3
4
5
6
7
8
9
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.fifoqueue.GridFifoQueueCollisionSpi">
            <property name="parallelJobsNumber" value="1"/>
        </bean>
    </property>
    ...
</bean>
GridJobStealingCollisionSpi

GridJobStealingCollisionSpidoc supports job stealing from over-utilized nodes to under-utilized nodes. This SPI is especially useful if you have some jobs within task complete fast, and others sitting in the waiting queue on slower nodes. In such case, the waiting jobs will be stolen from slower node and moved to the fast under-utilized node.

The design and ideas for this SPI are significantly influenced by Java Fork/Join Framework authored by Doug Lea and planned for Java 7. GridJobStealingCollisionSpidoc took similar concepts and applied them to the grid (as opposed to within VM support planned in Java 7).

Quite often grids are deployed across many computers some of which will always be more powerful than others. This SPI helps you avoid jobs being stuck at a slower node, as they will be stolen by a faster node. In the following picture when Node3 becomes free, it steals Job13 and Job23 from Node1 and Node2 respectively.

http://www.gridgain.com/images/job_stealing_white.png

Note
Usage

Note that this SPI must always be used in conjunction with GridJobStealingFailoverSpi doc. The responsibility of Job Stealing Failover SPI is to properly route stolen jobs to the nodes that initially requested (stole) these jobs. The GridJobStealingCollisionSpidoc maintains a counter of how many times a jobs was stolen and hence traveled to another node, and it will not allow a job to be stolen if this counter exceeds a certain threshold. The threshold value is configured in GridJobStealingCollisionSpidoc.

Keep in mind that collision resolution happens on job executing nodes (workers), and failover happens on task-initiating node (master). So, if you have a case where a group of nodes is responsible only for sending tasks (masters) and another group is responsible for executing jobs (workers), it should be sufficient to configure GridJobStealingFailoverSpidoc on master nodes only and GridJobStealingCollisionSpi doc on worker nodes only. You should also take a look at setStealingEnabled(boolean) and setStealingAttributes(Map) configuration properties as they also allow you to control which nodes participate in job stealing.

Tip
Disable Job Stealing
Use GridJobStealingDisableddoc annotation to disable job stealing and make sure that the jobs get executed exactly on the node they were mapped to. If job fails on the selected node it will be failed over as usual according to the configured failover policy in Failover SPI.
Configuration

The following configuration parameters can be used to configure GridJobStealingCollisionSpi doc:

Setter Method Description Optional Default

setActiveJobsThreshold(int) doc

Sets number of jobs that are allowed to be executed in parallel on this node. Node that this attribute may be different for different grid nodes as stronger nodes may be able to execute more jobs in parallel.

Yes

95

setWaitJobsThreshold(int) doc

Sets wait jobs threshold. If number of jobs in the waiting queue goes below this threshold, then implementation will attempt to steal jobs from other, more over-loaded nodes. Note this value may be different (but does not have to be) for different nodes in the grid. You may wish to give stronger nodes a smaller waiting threshold so they can start stealing jobs from other nodes sooner.

Yes

0

setMessageExpireTime(long) doc

Message expire time configuration parameter. If no response is received from a busy node to a job stealing request, then implementation will assume that message never got there, or that remote node does not have this node included into topology of any of the jobs it has. In any case, job steal request will be resent (potentially to another node).

Yes

1,000 ms (1 second)

setMaximumStealingAttempts(int) doc

Sets maximum number of attempts for a single job to be stolen. Once a job reaches this threshold, not more attempts will be made by other nodes to steal it. Note that this attribute should be the same on all nodes.

Yes

5

setStealingEnabled(boolean) doc

Enables/disables job stealing on this node.

Yes

true

setStealingAttributes(Map) doc

Enables stealing to/from only nodes that have given attributes set.

Yes

empty map

Examples

As any GridGain SPI, GridJobStealingCollisionSpidoc SPI can be configured either directly from code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
GridJobStealingCollisionSpi spi = new GridJobStealingCollisionSpi();

// Configure number of waiting jobs
// in the queue for job stealing.
spi.setWaitJobsThreshold(10);

// Configure message expire time (in milliseconds).
spi.setMessageExpireTime(500);

// Configure number of active jobs that are allowed to execute
// in parallel. This number should usually be equal to the number
// of threads in the pool (default is 100).
spi.setActiveJobsThreshold(50);

// Configure maximum stealing attempts.
spi.setMaximumStealingAttempts(10);

// Enable stealing.
spi.setStealingEnabled(true);

// Set stealing attribute to steal from/to nodes that have it.
spi.setStealingAttributes(Collections.singletonMap("node.segment", "foobar"));

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default Collision SPI.
cfg.setCollisionSpi(spi);

or from Spring configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
            <property name="activeJobsThreshold" value="100"/>
            <property name="waitJobsThreshold" value="0"/>
            <property name="messageExpireTime" value="1000"/>
            <property name="maximumStealingAttempts" value="10"/>
            <property name="stealingEnabled" value="true"/>
            <property name="stealingAttributes">
                <map>
                    <entry key="node.segment" value="foobar"/>
                </map>
            </property>
        </bean>
    </property>
    ...
</bean>
GridPriorityQueueCollisionSpi

GridPriorityQueueCollisionSpidoc allows a certain number of jobs with highest priority to proceed without interruptions. All other jobs will be put on waiting list until their turn. Job priority is retrieved from job priority attribute. If no priority has been assigned to a job (job priority attribute was not found), then default priority of 0 is used.

Note that if parallelJobsNumber doc configuration parameter is not set, then this SPI will allow all concurrent jobs to proceed without interruptions. Make sure to set +parallelJobNumber doc parameter to enforce an upper limit for a maximum number of concurrent jobs that can proceed without interruptions. For example, to have only one job with highest priority execute at a time, you should set parallelJobsNumber parameter to 1.

GridTask Code Example

Here is an example of a grid tasks that uses priority collision SPI is configured. Note that priority collision resolution is absolutely transparent to the user and is simply a matter of proper grid configuration. Also, priority may be defined only for task (it can be defined within the task, not at a job level). All split jobs will be started with priority declared in their owner task.

This example demonstrates how urgent task may be declared with a higher priority value. Priority SPI guarantees (see its configuration in example below, where number of parallel jobs is set to 1) that all jobs from MyGridUrgentTask will most likely be activated first (one by one) and jobs from MyGridUsualTask with lowest priority will wait. Once higher priority jobs complete, lower priority jobs will be scheduled.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class MyGridUrgentTask extends GridTaskSplitAdapter<Object, Object> {
    public static final int SPLIT_COUNT = 5;

    @GridTaskSessionResource
    private GridTaskSession taskSes = null;

    @Override
    protected Collection<? extends GridJob> split(int gridSize, Object arg) throws GridException {
        ...
        // Set high task priority (note that attribute name is used by the SPI
        // and should not be changed).
        taskSes.setAttribute("grid.task.priority", 10);

        Collection<GridJob> jobs = new ArrayList<GridJob>(SPLIT_COUNT);

        for (int i = 1; i <= SPLIT_COUNT; i++) {
            jobs.add(new GridJobAdapter<Integer>(i) {
                ...
            });
        }
        ...
    }
}

and

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class MyGridUsualTask extends GridTaskSplitAdapter<Object, Object> {
    public static final int SPLIT_COUNT = 20;

    @GridTaskSessionResource
    private GridTaskSession taskSes = null;

    @Override
    protected Collection<? extends GridJob> split(int gridSize, Object arg) throws GridException {
        ...
        // Set low task priority (note that attribute name is used by the SPI
        // and should not be changed).
        taskSes.setAttribute("grid.task.priority", 5);

        Collection<GridJob> jobs = new ArrayList<GridJob>(SPLIT_COUNT);

        for (int i = 1; i <= SPLIT_COUNT; i++) {
            jobs.add(new GridJobAdapter<Integer>(i) {
                ...
            });
        }
        ...
    }
}
Configuration

The following configuration parameters can be used to configure GridPriorityQueueCollisionSpi:

Setter Method Description Optional Default

setDefaultPriority(int)+ doc

Sets default priority used if job does not have job priority attribute set in the context.

Yes

0

setParallelJobsNumber(int) doc

Sets upper limit for a number of jobs that will proceed without interruptions.

Yes

95

setPriorityAttributeKey(String) doc

This key will be used to look up job priorities from job context (GridJobContext.getAttribute(String) doc method).

Yes

grid.job.priority

Examples

As any GridGain SPI, GridPriorityQueueCollisionSpi doc can be configured either directly from code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GridPriorityQueueCollisionSpi colSpi = new GridPriorityQueueCollisionSpi();

// Execute all jobs sequentially by setting parallel job number to 1.
colSpi.setParallelJobsNumber(5);

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default collision SPI.
cfg.setCollisionSpi(colSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

1
2
3
4
5
6
7
8
9
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="collisionSpi">
        <bean class="org.gridgain.grid.spi.collision.priorityqueue.GridPriorityQueueCollisionSpi">
            <property name="parallelJobsNumber" value="5"/>
        </bean>
    </property>
    ...
</bean>
Default Implementation

If no collision SPI is provided in configuration by default GridFifoQueueCollisionSpi is used.

20.3.3. Configuration

GridCollisionSpidoc is provided in grid configuration passed into GridFactorydoc at startup. You can configure a different collision SPI implementation as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

GridFifoQueueCollisionSpi colSpi = new GridFifoQueueCollisionSpi();

// Limit number of parallel jobs.
colSpi.setParallelJobsNumber(10);

// Configure your own collisioin SPI.
cfg.setCollisionSpi(colSpi);

GridFactory.start(cfg);

Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.

20.4. Failover SPI

20.4.1. Overview

Starting with GridGain 2.1 you can provide multiple instances of Failover SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTaskdoc implementaiton.

GridFailoverSpidoc SPI provides developer with ability to supply custom logic for handling failed execution of a grid job. Failover is triggered when method GridTask.result(GridJobResult, List)doc returns GridJobResultPolicy.FAILOVERdoc policy indicating that the result of job execution must be failed over. Job execution can fail for a number of reasons:

  • Job execution threw an exception (this condition has to be handled by user explicitly).

  • Job returned bad result (this condition has to be handled by user explicitly).

  • Node on which job was executing left topology, crashed, or stopped (failover is handled by default in GridTaskAdapterdoc).

  • Job was rejected before it got a chance to execute, while still on waiting list (failover is handled by default in GridTaskAdapterdoc).

In all cases failover SPI takes failed job (as failover context) and list of all grid nodes and produces another node on which the job execution will be retried. It is up to failover SPI to make sure that job is not mapped to the node it failed on. The failed node can be retrieved from GridFailoverContext.getJobResult().getNode()doc method.

Note Note that for any job spawned by a task, failover SPI will be invoked only on the node that initiated the task (obviously, it cannot be invoked on failed node).

20.4.2. Built-in Implementations

GridAlwaysFailoverSpi

GridAlwaysFailoverSpidoc which always reroutes a failed job to another node. Note, that at first an attempt will be made to reroute the failed job to a node that was not part of initial split for a better chance of success. If no such nodes are available, then an attempt will be made to reroute the failed job to the nodes in the initial split minus the node the job is failed on. If none of the above attempts succeeded, then the job will not be failed over and null will be returned.

Configuration

The following configuration parameters can be used to configure GridAlwaysFailoverSpi

Setter Method Description Optional Default

setMaximumFailoverAttempts(int) doc

Sets maximum number of attempts to execute a failed job on another node. This parameter is available starting with GridGain 2.0

Yes

5

Examples

As any GridGain SPI, GridAlwaysFailoverSpidoc can be configured either directly from code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GridAlwaysFailoverSpi failSpi = new GridAlwaysFailoverSpi();

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override maximum failover attempts.
failSpi.setMaximumFailoverAttempts(5);

// Override default failover SPI.
cfg.setFailoverSpi(failSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

1
2
3
4
5
6
7
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <bean class="org.gridgain.grid.spi.failover.always.GridAlwaysFailoverSpi">
         <property name="maximumFailoverAttempts" value="5"/>
    </bean>
    ...
</bean>
GridJobStealingFailoverSpi

GridJobStealingFailoverSpidoc needs to always be used in conjunction with GridJobStealingCollisionSpi. When GridJobStealingCollisionSpidoc receives a steal request and rejects jobs so they can be routed to the appropriate node, it is the responsibility of this GridJobStealingFailoverSpidoc SPI to make sure that the job is indeed re-routed to the node that has sent the initial request to steal it.

GridJobStealingFailoverSpidoc knows where to route a job based on the GridJobStealingCollisionSpi.THIEF_NODE_ATTR doc job context attribute (see GridJobContextdoc). Prior to rejecting a job, GridJobStealingCollisionSpidoc will populate this attribute with the ID of the node that wants to steal this job. Then GridJobStealingFailoverSpidoc will read the value of this attribute and route the job to the node specified.

If failure is caused by a node crash, and not by steal request, then this SPI behaves identically to GridAlwaysFailoverSpi, and tries to find the next balanced node to fail-over a job to.

Note
Usage

GridJobStealingFailoverSpidoc must always be used in conjunction with GridJobStealingCollisionSpidoc. Please refer to GridJobStealingCollisionSpi documentation for more information.

Keep in mind that collision resolution happens on job executing nodes (workers), and failover happens on task-initiating node (master). So, if you have a case where a group of nodes is responsible only for sending tasks (masters) and another group is responsible for executing jobs (workers), it should be sufficient to configure GridJobStealingFailoverSpidoc on worker nodes only and GridJobStealingCollisionSpidoc on master nodes only.

Configuration

The following configuration parameters can be used to configure GridJobStealingFailoverSpidoc:

Setter Method Description Optional Default

setMaximumFailoverAttempts(int) doc

Sets maximum number of attempts to execute a failed job on another node.

Yes

5

Examples

As any GridGain SPI, GridAlwaysFailoverSpidoc can be configured either directly from code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GridJobStealingFailoverSpi failSpi = new GridJobStealingFailoverSpi();

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override maximum failover attempts.
failSpi.setMaximumFailoverAttempts(5);

// Override default failover SPI.
cfg.setFailoverSpi(failSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

1
2
3
4
5
6
7
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
         <property name="maximumFailoverAttempts" value="5"/>
    </bean>
    ...
</bean>
GridNeverFailoverSpi

GridNeverFailoverSpidoc which never fails over. This implementation never fails over a failed job by always returning null out of GridFailoverSpi.failover(GridFailoverContext, List) doc method.

Configuration

This SPI has no configuration parameters.

Examples

As any GridGain SPI, GridNeverFailoverSpidoc can be configured either directly from code:

1
2
3
4
5
6
7
8
9
GridNeverFailoverSpi failSpi = new GridNeverFailoverSpi();

GridConfigurationAdapter cfg = new GridConfigurationAdapter();

// Override default failover SPI.
cfg.setFailoverSpi(failSpi);

// Start grid.
GridFactory.start(cfg);

or from Spring configuration file:

1
2
3
4
5
6
7
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
    ...
    <property name="failoverSpi">
        <bean class="org.gridgain.grid.spi.failover.never.GridNeverFailoverSpi"/>
    </property>
    ...
</bean>
Default Implementation

If no failover SPI is provided in configuration by default GridAlwaysFailoverSpi is used.

20.4.3. Configuration

GridFailoverSpidoc is provided in grid configuration passed into GridFactorydoc at startup. You can configure a different failover SPI implementation as follows:

1
2
3
4
5
6
7
8
GridConfigurationAdapter cfg = new GridConfigurationAdapter();

GridNeverFailoverSpi failSpi = new GridNeverFailoverSpi();

// Configure your own failover SPI.
cfg.setFailoverSpi(failSpi);

GridFactory.start(cfg);

Note that GridConfigurationdoc interface is just a bean and can also be configured using spring XML configuration.

20.5. Topology SPI

20.5.1. Ove