GridGain is a software middleware that enables development of high performance compute and data intensive distributed applications for real-time Big Data processing.
This book provides an in-depth knowledge on how to use GridGain software.
1. Introduction
First of all, the whole GridGain team thanks you for picking up this book and devoting your time to know more about GridGain project. Although this book is being written primarily by Nikita Ivanov and Dmitri Setrakyan - all of us here at GridGain are pitching in with reviews, tests, proof-reading, examples and creative ideas.
GridGain project has been an amazing journey for all of us to this point and as you read these lines we are continuing our work on adding new features and improving existing ones, fixing bugs (happens to the best of us) and keep thinking on how to make GridGain more enjoyable and productive to use.
GridGain open source project started in the spring of 2005 with just Nikita and Dmitriy working on it in their spare time. We’ve managed to get our first official release out in the summer of 2007. In three years since then - in the late 2010 (when this introduction is written) - our software is now starting every 10 seconds around the globe and undoubtedly is one of the most popular distributed programing frameworks in JVM ecosystem - so we must be doing something right.
This is even more exciting for us since GridGain Systems is an engineering company first and foremost. We remain small as we believe in small "surgical" teams and every member of our company still writes code (some of us less so as we need to travel, speak and write books like this one - which we enjoy greatly). We’ve visited more than 50 conferences and Java Use Groups around the globe in the last 3 years to talk about GridGain - and we are grateful to each and everyone of you who came up to our talks. That was the only "marketing" that we could afford but hour and a half was usually enough to convince folks to try GridGain out. In fact, where else you could see a full-fledged MapReduce application running on multiple nodes written from scratch in front of your eyes in less than 5 minutes?
We want to thank you again and we hope that you’ll find this book useful and effective guide for discovering GridGain.
1.1. What is this book about?
This book is about how to use GridGain software to develop innovative distributed compute and data intensive applications that run on any managed infrastructure - from a simple laptop on which this book was written to a large grids and all types of clouds. When developing with GridGain and reading this book you can use both Java, Scala or Groovy programming languages. At the time of this writing - GridGain was the only distributed computing middleware with native Scala support.
This book does not replace API manual references, and you would still need to consult them from time to time for up to date method signatures, parameter description, etc. As we are always saying in our project - documentation is the code too and we pay great deal of attention to API References (Javadoc, Scaladoc and Groovydoc) - in fact we have one of the best organized and maintained code level documentation among any relevant projects.
This book is short considering its subject - and it is on purpose. We strongly believe that one of the main reason for slow adoption of grid and cloud computing in the last decade was over-complication and unnecessary "dramatization" of entire subject. In fact, the original idea behind GridGain came from rather unproductive experience developing application with Globus toolkit - innovative piece of software for early 90s but awfully over-engineered and out of place by the turn of the century with rapid advancements in server-side JVM-based programming.
|
|
We’ve designed this book in a way that you can take a first pass over it during the weekend and have a pretty good grasp on all major APIs and concepts. As you continue to work with GridGain you’ll be coming back to specific chapters for more details and to refresh on some of the less obvious parts. This is a perfectly normal way to read this book and we highly encourage it. |
As you discover in this book the distributed computing is not necessarily complex or unwieldy as it may seem from the outside - in fact, most of its concepts are readily familiar for the most of you. Where it was usually getting complicated in the past is the tooling and framework support. That was a sore state of the affairs for a long time - most of engineering community just accepted that the distributed programming (grid and cloud computing included) just has to be "involved" because…. it is so, right?
In reality - it’s largely inaccurate. With the right tooling and framework support the distributed computing can be relatively simple and very productive. One of the main ideas in this book is to try to convince you that it is so. Every month or so we at GridGain receive email or a forum post where someone relates his or her experience of downloading GridGain during the day and by the midnight having the first MapReduce application running on Amazon EC2. By the time you finish this book it won’t seem like an exaggeration and you will have your first application running on EC2 much, much quicker.
This book covers GridGain starting with version 3.0 and provides in-depth manual for all three main technologies that are tightly integrated into GridGain cloud application platform (as of October 2010 GridGain is the only distribute middleware that provides all three technologies in the same platform - let alone for Java, Scala and Groovy languages):
-
computational grids (a.k.a. MapReduce)
-
data grid (a.k.a. distributed caching)
-
zero deployment with auto-scaling (a.k.a. elasticity on clouds)
Each of these main topics is covered in depth with plenty of examples in both Java, Scala and Groovy.
All in all, this book’s character perfectly reflects on what we think modern distributed programming should be - simple, effective and amazingly productive.
That is if you use GridGain…
1.2. About Authors…
This book is written primarily by Nikita Ivanov and Dmitriy Setrakyan. As always we encourage to contact us directly should you have any questions about this book or about GridGain. You can reach Nikita at nivanov@gridgain.com and Dmitriy at dsetrakyan@gridgain.com.
Both Nikita and Dmitriy have over 30 years of combined experience in distributed programming mostly in Java (some MPI C/C++ back in 90s - ouch) and now Scala. Nikita and Dmitriy were at the beginning of the project and to this day coordinate and lead most of the development work.
They both write code almost every day for GridGain despite heavy travel and frequent speaking engagements. And when they don’t write code - you’ll find them rooting for San Jose Sharks - the favorite hockey team of GridGain.
There are many way to stay in touch with GridGain project:
-
Quite naturally, http://www.gridgain.com is an excellent starting point for anything GridGain related.
-
Great source of information is our public forums at http://jive.gridgain.org.
-
Follow us on @twitter
-
Follow us on Facebook
-
Nikita’s blog: http://gridgaintech.wordpress.com/
-
Dmitriy’s blog: http://gridgain.blogspot.com/
2. Overview
In this chapter we’ll lay down some of the basic ideas about grid and cloud computing and how GridGain fits into it. The goal of this chapter is to make sure we are on the same page with you, the reader, as far as fundamentals of grid and cloud computing (and you won’t believe how far apart we can be on these…)
2.1. What is GridGain?
In a nutshell - GridGain is a JVM-based middleware software that enables the development of compute and data intensive High Performance Distributed Applications. Applications developed with GridGain can scale up on any infrastructure - from a single Android device to a large cloud.
GridGain provides two major areas of functionality:
-
Compute Grids
-
In-Memory Data Grids
On top of that it provides the multitude of surrounding technologies many of which are frequently used by our clients on their own.
With GridGain your applications can:
-
Work in a zero-deployment mode.
-
Scale up or down based on demand.
-
Cache distributed data in data grid.
-
Co-locate data and computations.
-
Run sql queries against cached data.
-
Store and query JSON objects.
-
Speed up task using MapReduce processing.
-
Use distributed thread pools.
-
Distribute the workload on the grid.
-
Use distributed queues and atomics.
-
Effectively exchange messages.
-
Auto-discover all grid resources.
-
Execute closures on the grid.
-
Grid-enable Java, Groovy and Scala code.
-
… and much more
2.2. Why Compute and In-Memory Data Grid?
Compute and In-Memory Data Grid act as two main axiomatic technologies for a modern distributed programming. They are fundamental because they solve two underlying problems faced by any distributed system:
-
distribution of computations
-
distribution of data
I always like to provide this analogy: every computing device - from Turing machine to the latest iPod - contains memory and a processing unit. Think about it… memory and the processing form the foundation of our computing capabilities. And so is the ability to distribute computations and data form the foundation of distributed programming.
And just like in late 1960s we’ve had first "system on the chip" where memory and processing units were finally integrated and combined on the same chip providing for cheaper, more energy efficient and much faster overall systems - GridGain has pioneered integrated middleware that combines compute and in-memory data grids in one cohesive and integrated distributed middleware software. This has resulted in similar benefits of simplified programming model, easier applicability and unified configuration and management.
Compute and in-memory data grids are the key topics in this book and we’ll talk a lot more about these two in the following chapters.
2.3. Why High Performance and Cloud Computing?
You noticed in the previous chapter that we call GridGain as a software middleware for developing High Performance Cloud Computing application. But why we focus on High Performance and what do we mean by that? And what does it have to do with Cloud Computing?
|
|
GridGain Philosophy The term High Performance Cloud Computing really came in the 2010 after almost 5 years
of GridGain development. We believe it reflects perfectly the design goals that we
originally had. Many, if not all, of GridGain’s features, designs and approaches stem from these
goals. |
Let’s talk about Cloud Computing first.
2.3.1. Cloud Computing
Despite all the buzz about cloud computing we believe strongly that from the software development perspective the cloud computing is almost synonymous with a traditional distributed programming.
In fact, that “almost” above accounts simply for a fact that unlike the traditional data centers, grids and clusters of the last decade clouds offer more fine grained resources virtualization and more management options. In a nutshell - your application development principles remain largely the same - but you have more options and more choices in how your application is deployed and how it utilizes available computing resources. Rest of the parallel distributed programming challenges of the last 25 years remain fully intact.
|
|
10 Years Ago… While ten years ago the most you should have accounted for was a new server coming up in
your local grid - today you need to be prepared for not just a new server but an extra CPUs
or extra disk storage or extra RAM appearing for your application that is snapshot and migrated
potentially half way across the globe (just look at RackSpace Flavors, for example). |
Still, the absolutely majority of the problems and challenges you are facing today while developing distributed software systems coalesce around parallelization of computing and data high availability in the distributed context - both are which are absolutely critical for any scalable distributed software system.
So, when we say Cloud Computing we mean Distributed Computing plus a few new important details.
|
|
When We Say "Cloud Computing". Cloud Computing = Distributed Computing + Data Center Virtualization |
2.3.2. High Performance (HP)
High Performance aspect is equally interesting. When we present about GridGain on the conferences we inevitably get asked about this… why High Performance and Cloud in the same sentence?
The answer is very simple: not every cloud application needs to be high performance. In fact, most of today’s cloud applications (i.e. the application that are deployed on the clouds) are not high performance.
|
|
Cloud Applications Most of today cloud applications (i.e. application deployed in the clouds) are not high performance. |
We use the term High Performance to specifically categorize applications that use distribution as means for processing parallelization, i.e. to achieve the scalability and/or performance that is theoretically unattainable on a single processing unit.
On the other hand, distributed applications that are not High Performance use cloud deployment as more convenient or economical deployment option without much need for improved scalability or performance.
|
|
High Performance vs. Not High Performance
The distinction is very important:
Naturally, some HP applications use cloud deployment because of convenience and economy as well. |
So, tieing it all together GridGain is a High Performance Cloud Application platform because:
2.3.3. Real-Time Cloud Applications
For another take on High Performance Cloud Application look at this blog entry I wrote in the middle of 2011:
2.4. Grid and Cloud Computing
To start off this sub-chapter I’ll tell you one story that happened to me few years ago.
I was presenting about GridGain at Java User Group at Dayton, Ohio. For those of you who don’t know - Dayton, OH is essentially a "sleeping quarters" for Wright-Patterson Air Force Base, one of the largest military research facilities in the world. It employees almost 30,000 people and headquarters massive Air Force Institute of Technology and Air Force Research Laboratory just to boot and it is known to have one of the largest super-computing centers in the world too. So, I’ve had few people from the base on my presentation…
|
|
Wright-Patterson Air Force Base is rumored to be the center of UFO research or so many people believe after Nevada’s Area 51 was closed few decades ago… |
After the presentation I’ve chatted with some folks and conversation drifted into general topic of grid computing and what we all understand by that (and by relatively new back then concept of cloud computing). It was a real surprise to me (to say the least) that I got three very different answers from three guys working on the base - essentially working in one big company.
One answer was that grid is really nothing new alluding to parallel Fortran of almost 50 years old, another one was more in line with common understanding of grid computing being just a new re-tooling of traditional parallel processing, and yet another answer was that the whole thing is just a hype and multi-core CPUs will displace it all together in 5-10 years max.
I remember on my flight back I was a bit perplexed by how far apart those guys were while working side by side in albeit large organization and not just on technicality - but on fundamental view of distributed programming (grid or cloud - doesn’t matter). Starting with my next presentation onward I’ve made a rule to always state what I believe about grids and clouds to at least make sure we have common frame of reference. Whether or not you agree with - is another question…
So, here’s my take. I don’t particularly like terms grid and cloud computing. There’s nothing that resembles a "grid" in grid computing and obviously there’s nothing that is performed in the "cloud" when it comes to cloud computing. Both marketing terms "grid computing" and "cloud computing" represent slight variations of traditional distributed programming.
|
|
Put in more canonical form, if you have more than one computing resource working on the same
problem in parallel - you have a grid and you are engaged in grid computing. If these computing
resources (all or in part) are virtualized and available to you on demand - they represent a
cloud and you are doing cloud computing. In the nutshell - that’s it. |
As you can see the difference between grid computing and cloud computing is only in representation of computing resources which in most cases should be irrelevant. The lines between grids and clouds are getting blurrier by the day and we use both terms interchangeably throughout this book (as long as context is clear).
It is, however, important to know the difference and new challenges that cloud infrastructure brings to the table for you as a software developer.
2.5. IaaS, PaaS, and SaaS
Only at the pick of the hype around the cloud computing can you get a chapter named like that…
Nowadays these terms are thrown often without much regard or understanding and while IaaS and SaaS are somewhat well defined, the PaaS is something that poorly defined if at all. Let us try to define these terms and see where GridGain fits in the picture.
|
|
Understanding this nomenclature is not essential for everyday usage of GridGain and most of the concepts in GridGain stay clear from these high level marketing terms. Yet - a cursory look is worth while and it will help you navigate the plethora of marketing literature that surrounds the cloud computing today. |
Picture below provide basic overview of how GridGain related to IaaS and PaaS:
2.5.1. IaaS - Infrastructure As a Service
IaaS stands for "Infrastructure As a Service". IaaS is often (wrongly) synonymous with cloud computing. It essentially means providing virtualized computing resources as a services. Think of Amazon EC2, for example.
|
|
And who would ever thought that a nascent online book seller and one of the few dotcom survivors would spearhead the revolution that is so much bigger that just an online retailing - a revolution that is radically changing the way we think about information systems! |
Amazon had its own data center and sometime ago decided to earn extra money by renting out their often unused computing capacity. So, they put a hardware virtualization (like VMs from VMWare or Citrix) on their servers and exposed the management of these VMs via Web browser so that anyone could create an account and start managing the VM instances.
In a nutshell - that’s all there’s to it.
It is important to note that IaaS (or clouds) can be public, private or hybrid. Public clouds are based on infrastructure that is publicly available, i.e. IaaS provides generally gives access to its data center to anyone. Private clouds are built by individual organizations for their internal use. And hybrid clouds exhibit both types of behavior.
While public and private clouds are usually a physical infrastructures with difference being who gets the access to the clouds - the hybrid clouds are almost always a virtual clouds. Similar the Virtual Private Networks (VPN), the virtual clouds are created on top of one or more physical public and private clouds and provide its end users seamless cloud/IaaS transparency. These hybrid clouds are often created for business applications by either PaaS or software middleware like GridGain.
|
|
Clouds can be public, private, and hybrid. While public and private clouds are usually physical infrastructures - the hybrid clouds are always virtualized clouds built via software on top of one or more physical public or private clouds. Analogy between hybrid clouds and VPN is almost one-to-one. |
There are plenty of IaaS provides all following in steps with Amazon all providing different sets of functionality and different twists. A quick look at Amazon AWS offerings will show how complex and diverse it has become in recent years.
It is also a strong sign that hardware virtualization and services around it will be rapidly advancing. We are just few years away from being able to add an extra core to our application or acquire extra network bandwidth capacity on demand for our system’s pick time and scale it down the moment it doesn’t need it anymore.
These capabilities will undoubtedly change the way we develop our applications.
2.5.2. PaaS - Platform As a Service
PaaS stands for "Platform As a Service". PaaS essentially provides an abstraction over various IaaS providers and adds additional services. Additional services mostly consist of some set of deployment and provisioning services aiming at supporting application multi-tenancy, i.e. ability to host multiple applications in secure isolation on the same VM or a set of VMs (don’t confuse it with Java VM).
|
|
VMs vs. Java VMs
Throughout this book we’ll use terms VM to denote hardware virtualization virtual machine (VM). To denote Java Virtual Machine we’ll use term JVM. |
The problem with PaaS is that no one has a precise definition of what PaaS really is… Its definition is largely based on specific vendor capabilities. There is, however, one clear trait of PaaS: it abstracts out its users from worrying about specifics of various IaaS providers and differences in their operations and functionality.
PaaS also sprung out the notion of DevOps - a symbiosis of application development and traditional IT functions. It is often said that PaaS provides abstraction over IaaS and DevOps services.
|
|
PaaS provides abstraction over IaaS and DevOps services. |
Most of the PaaS vendors today (early 2011) concentrate mainly on providing deployment and application provisioning services. PaaS from VMWare/Spring, Google AppEngine, CloudBees and RedHat/JBoss, for example, do exactly that. They all allow you to take your whole application and through a serious of manual steps move or deploy it onto IaaS infrastructure with some limited, if any, scale out functionality.
PaaS as a technology today is in its very early stages. It is clear that PaaS as a concept and technology will likely see the most amount of changes in the coming years - and by the time you read this book some of these changes may significantly affect your understanding of what PaaS can do for you.
|
|
IaaS abstracts out data center and exposes it as a service. PaaS abstracts out IaaS providers and adds DevOps. |
2.5.3. SaaS - Software As a Service
IaaS stands for "Software As a Service". Surprisingly for most casual observers, SaaS has relatively nothing to do with cloud computing or IaaS and PaaS specifically. Essentially, if you run your "application" in the browser - it is SaaS application.
That’s it.
Historically, SaaS came from ASP (Application Service Provider) businesses and it shares almost everything with ASP (except for more catchy name). Interestingly enough SaaS was the first "as a service" abbreviation long before IaaS and PaaS came to light. But when hardware virtualization and surrounding services became popular, "as a service" moniker was a logical progression for providing computing Infrastructure and Platforms "as a service".
2.5.4. How GridGain Fits?
By looking at the picture few paragraphs above you can see that GridGain can easily work directly with IaaS (like Amazon AWS) or through the PaaS. In fact, GridGain is completely independent from either PaaS or IaaS - it can work without any specific cloud or grid or cluster infrastructure.
This ability, this lightweight approach, is one of the key design advantages of GridGain. It provides you, the developer, exactly the same services whether you run GridGain on a simple Android device, a laptop, few servers, small grid or a large cloud.
So, you can select the desired DevOps approach for your application and GridGain will happily support it!
2.6. License
GridGain is dual-licensed:
-
GridGain Community Edition is free open-sourced and licensed under GPL version 3.
-
GridGain Enterprise Edition is closed-sourced and commercially licensed.
EULA (end-used license agreement) for Enterprise Edition is available at GridGain root installation folder after you install GridGain and it is also available on download page on the http://www.gridgain.com website.
GridGain System also provides OEM, Enterprise an Academic licenses with further details available upon request at sales@gridgain.com
|
|
GridGain 1.x-2.x Note also that previous version of GridGain 1.x-2.x were licensed under LGPL. |
Note that throughout the book we don’t directly distinguish between features available in Enterprise and Community Edition. In cases where it is important we make a note.
2.7. Support
The best way to get a free support on GridGain software is to dip into our active community with wealth of information on our free support forum: http://jive.gridgain.org. This forum is closely monitored by GridGain System’s engineers and we try our best to provide free support there when applicable.
GridGain Systems, Inc., as a company behind GridGain project, also provides full spectrum of commercial services around GridGain software including:
-
Commercial subscription for Community and Enterprise editions
-
Consulting and professional services
-
OEM licensing
-
"Bronze", "Silver" and "Gold" levels of support
-
GridGain Training seminars
When it comes to training and support - we at GridGain have a very simple philosophy: we do our own heavy lifting. We believe that we are the best people to support our own software and who is it better to learn about GridGain from but the people who develop it daily?
All information about services provided by GridGain Systems can be found at http://www.gridgain.com/services.html
3. Taste of GridGain
Before we start digging into nitty-gridy details of GridGain functionality let’s quickly look at what we can accomplish in 10-15 minutes. We’ll create one application in Java and one in Scala utilizing Compute and In-Memory Data Grids.
Let’s see how quickly we can do both.
|
|
Installation
We have a whole chapter dedicated to installation. However, installation of GridGain is rather trivial - and if you haven’t done it already here is a quick 3 steps:
You are done! |
|
|
Unix
For the rest of this chapter (and most of this book) we will assume the JetBrain IDEA 10 and Unix environment like Unix, Linux or Mac OSX. Most of the steps and instructions apply almost verbatim to Eclipse, Emacs or NetBeans running on Windows with obvious changes to paths and certain project management capabilities of IDEs. |
Now - the first step when developing distributed application is to have a… grid. With GridGain you can have it anywhere but for the purpose of this example we will create one right on the same computer where you will be running the main example.
|
|
Multiple GridGain Nodes
One the coolest capability of GridGain is its ability to run multiple GridGain nodes on the same computer or even… inside the same JVM. Think about it - you can launch the entire cloud in a single JVM and enjoy local debugging of your application while it is running in the local virtualized cloud. Now - that’s pretty powerful and that’s exactly how we at GridGain to debug and test most of our complex internal distributed logic. |
To have a local grid we are going to have two GridGain nodes running standalone and the third node will be embedded into our applications that we will develop. When our application starts - it will join the grid (i.e. join the topology) making the grid of three nodes.
Open the command shell and assuming you are in GRIDGAIN_HOME folder just type this:
$ bin/ggstart.sh
If everything is fine (you set GRIDGAIN_HOME environment variable properly and you have Java installed) you will see the output similar to this:
[11:52:50] _____ _ _______ _ ____ ____
[11:52:50] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[11:52:50] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[11:52:50] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[11:52:50]
[11:52:50] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[11:52:50] ver. x.x.x-DDMMYYYY
[11:52:50] Copyright (C) 2005-2011 GridGain Systems, Inc.
[11:52:50]
[11:52:50] Quiet mode.
[11:52:50] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[11:52:50] << Enterprise Edition >>
[11:52:50] Daemon mode: off
[11:52:50] Language runtime: Java Platform API Specification ver. 1.6
[11:52:50] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49113, auth: off, ssl: off)]
[11:52:50] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[11:52:50] (!) SMTP is not configured - email notifications are off.
[11:52:50] (!) Cache is not configured - data grid is off.
[11:52:53] Topology snapshot [nodes=1, CPUs=4, hash=0xB12A5F18]
[11:52:54] License info:
[11:52:54] Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[11:52:54] License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[11:52:54] ^--License limits [<none>]
[11:52:54] System info:
[11:52:54] JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[11:52:54] OS: Mac OS X 10.7.1 x86_64, nivanov
[11:52:54] VM name: 4837@NIKITA-IVANOVs-MacBook-Pro.local
[11:52:54] Local ports used [TCP:8080 TCP:47100 UDP:47200 TCP:47300]
[11:52:54] GridGain started OK
[11:52:54] ^-- [grid=default, nodeId8=a26743ce, order=1315939970778, CPUs=4, addrs=[192.168.1.103]]
[11:52:54] ZZZzz zz z...
Start another command shell and type the same command again:
$ bin/ggstart.sh
This time you get almost identical output with few important changes:
[12:34:09] _____ _ _______ _ ____ ____
[12:34:09] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[12:34:09] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[12:34:09] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[12:34:09]
[12:34:09] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[12:34:09] ver. x.x.x-DDMMYYYY
[12:34:09] Copyright (C) 2005-2011 GridGain Systems, Inc.
[12:34:09]
[12:34:09] Quiet mode.
[12:34:09] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[12:34:09] << Enterprise Edition >>
[12:34:09] Daemon mode: off
[12:34:09] Language runtime: Java Platform API Specification ver. 1.6
[12:34:09] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49114, auth: off, ssl: off)]
[12:34:09] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[12:34:09] (!) SMTP is not configured - email notifications are off.
[12:34:09] (!) Cache is not configured - data grid is off.
[12:34:10] Node JOINED [nodeId8=a26743ce, addr=[192.168.1.103], CPUs=4]
[12:34:12] Topology snapshot [nodes=2, CPUs=4, hash=0xC287D25B]
[12:34:14] (!) Jetty failed to start (retrying every 3000 ms). Another node on this host?
[12:34:14] License info:
[12:34:14] Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[12:34:14] License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[12:34:14] ^--License limits [<none>]
[12:34:14] System info:
[12:34:14] JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[12:34:14] OS: Mac OS X 10.7.1 x86_64, nivanov
[12:34:14] VM name: 5227@NIKITA-IVANOVs-MacBook-Pro.local
[12:34:14] Local ports used [TCP:47101 UDP:47200 TCP:47301]
[12:34:14] GridGain started OK
[12:34:14] ^-- [grid=default, nodeId8=b8aac044, order=1315942449848, CPUs=4, addrs=[192.168.1.103]]
[12:34:14] ZZZzz zz z...
This output is a bit more interesting as it shows that both nodes discovered each other:
| Event of node joining the topology | |
| Snapshot of the topology showing total number of nodes and CPUs |
So at this point we have two nodes running (they are simply idling since we are not processing anything). Notice how didn’t have specify any configuration properties or configure anything at all. Everything works out-of-the-box as expected.
|
|
SPIs
As you will learn later in the book GridGain is composed of almost a dozen of different SPIs each providing pluggable kernel-level functionality. Two of these SPIs are discovery and communication SPIs that are responsible for maintaining distributed topology and exchanging the data between nodes. When you start GridGain with default configuration (like we just did) it starts with default SPI implementations (IP-multicast discovery and TCP/IP-based communication respectively) - and they work perfectly in our case. |
3.1. First GridGain Java App
Now that we have topology set we are going to switch to writing the actual code of our first application. Code examples below don’t have any dependencies on IDE - and you can follow up using any text editor of your choice.
The first app we are going to write would be computational MapReduce. It will calculate number of non-space characters in a given string by splitting the string into individual words, calculating word’s length on the remote nodes and aggregating results back.
We’ll use FP-based approach that GridGain natively support (even in Java):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import java.util.*;
import static org.gridgain.grid.GridClosureCallMode.*;
public class GridFunctionalMapReduceExample {
public static void main(final String[] args) throws GridException {
if (args.length == 1 && args[0].length() > 0)
GridFactory.in(new GridInClosureX<Grid>() {
@Override public void applyx(Grid g) throws GridException {
System.out.println("Length of input argument is " + g.reduce(
SPREAD,
GridFunc.<String, Integer>cInvoke("length"),
Arrays.asList(args[0].split(" ")),
GridFunc.sumIntReducer()
));
}
});
}
}
|
Not that there are many ways you can write this particular program in GridGain:
-
We can use direct Grid Task and Grid Jobs approach
-
We can use AOP-based grid enablement
-
We can GridGain 3.0 imperative APIs
-
Even using GridGain FP APIs there are different ways to code this program
FP-based approach above, however, yields probably the shortest program but can be initially confusing since Java isn’t really supporting FP natively. So let me explain step by step how this works.
Lines 1-4
We start by importing all necessary classes and constants into the scope.
Line 7
We define a main(…) method that will take input string.
Line 9
Method GridFactory.in(…) simply takes a closure and:
-
Starts the default GridGain node (custom configuration can be passed in as a parameter)
-
Executes passed in closure
-
Stop the GridGain node
So, essentially, it allows for a quick execution of the piece of code within context of a running GridGain node.
Line 9, 10
As a parameter to GridFactory.in(…) method call we are passing a newly created closure of
type GridInClosureX<Grid>. The body of this closure will be executed within context of a running
GridGain node.
Line 11-15
GridInClosureX<Grid> has one method applyx(Grid) which passed a Grid interface instance.
Inside of applyx(Grid) we call reduce(…) method that performs MapReduce operation. It accepts
four parameters:
-
Distribution mode. In our case we use GridClosureCallMode.SPREAD to spread the processing to all available nodes
-
A closure to execute on every remote node. We use utility method GridFunc.cInvoke(…) that creates closure via reflection based on the method name
-
A list of arguments that will be passed to each closure on the remote nodes
-
Reducing closure that takes results from the remote nodes and aggregates them into one final result. We, again, use pre-defined integer accumulator returned by GridFunc.sumIntReducer() method.
The logic of this computational MapReduce should be clear by now. We split input string by spaces into individual words, we then send every word to a remote node where a method length will be called on that word, and results of these calls will be returned back to reducer that will simply sum them up.
Now that we have a basic understanding of what is happening inside of this code let’s run this example. Depending on whether or not you are using IDE, Maven or Ant build you simply need to include gridgain.jar file that’s located in GRIDGAIN_HOME directory and all JARs under GRIDGAIN_HOME/libs folder to your classpath. Also, if you use IDEs - make sure that either environment variable GRIDGAIN_HOME is inherited by IDE process or system property with the same name GRIDGAIN_HOME is setup in your runtime configuration.
When it’s all set and done I’ve passed "GridGain is awesome" string my IDEA 10 runtime configuration and the got the following output in my IDEA output window:
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -ea -DGRIDGAIN_HOME=...
[12:28:11] _____ _ _______ _ ____ ____
[12:28:11] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[12:28:11] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[12:28:11] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[12:28:11]
[12:28:11] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[12:28:11] ver. x.x.x-DDMMYYYY
[12:28:11] Copyright (C) 2005-2011 GridGain Systems, Inc.
[12:28:11]
[12:28:11] Quiet mode.
[12:28:11] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[12:28:11] << Enterprise Edition >>
[12:28:11] (!) SMTP is not configured - email notifications are off.
[12:28:11] (!) Cache is not configured - data grid is off.
[12:28:11] Daemon mode: off
[12:28:11] Language runtime: Java Platform API Specification ver. 1.6
[12:28:11] Remote Management [restart: off, REST: on, JMX (remote: off)]
[12:28:11] GRIDGAIN_HOME=/Users/nivanov/svnroot/gg-trunk
[12:28:13] Node JOINED [nodeId8=b324d6c3, addr=[192.168.1.103], CPUs=4]
[12:28:15] Topology snapshot [nodes=2, CPUs=4, hash=0xCFFF5AA0]
[12:28:15] Node JOINED [nodeId8=e854d435, addr=[192.168.1.103], CPUs=4]
[12:28:15] Topology snapshot [nodes=3, CPUs=4, hash=0xF7C10287]
[12:28:15] License info:
[12:28:15] Licensed to 'GridGain Systems, Internal Development Only' on Feb 3, 2011
[12:28:15] License [ID=7D5CB773-225C-4165-8162-3BB67337894B, type=ENT]
[12:28:15] ^--License limits [<none>]
[12:28:15] New version is available at www.gridgain.com: 3.2.1c.05082011
[12:28:15] System info:
[12:28:15] JVM: Apple Inc., Java(TM) SE Runtime Environment ver. 1.6.0_26-b03-383-11A511
[12:28:15] OS: Mac OS X 10.7.1 x86_64, nivanov
[12:28:15] VM name: 9791@NIKITA-IVANOVs-MacBook-Pro.local
[12:28:15] Local ports used [TCP:8080 TCP:47102 UDP:47200 TCP:47302]
[12:28:15] GridGain started OK
[12:28:15] ^-- [grid=default, nodeId8=1cdf9436, order=1316028492427, CPUs=4, addrs=[192.168.1.103]]
[12:28:15] ZZZzz zz z...
Length of input argument is 16
[12:28:17] GridGain stopped OK [uptime=00:00:01:363]
As you can see the output is very similar to standalone nodes we’ve started few minutes ago. But in the end we have output of our MapReduce task which says:
Length of input argument is 16
correctly computing number of non-empty characters in input string "GridGain is awesome". Notice also we’ve had topology snapshot with three nodes (as expected). If you check other standalone nodes you will see the similar output to this:
[12:28:13] Node JOINED [nodeId8=1cdf9436, addr=[192.168.1.103], CPUs=4] [12:28:13] Topology snapshot [nodes=3, CPUs=4, hash=0xF7C10287] [12:28:17] Node LEFT [nodeId8=1cdf9436, addr=[192.168.1.103], CPUs=4] [12:28:17] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]
indicating that when our application started we’ve had three nodes in the topology and when our application completed and stopped we were back to two nodes (i.e. two standalone nodes).
|
|
Zero Deployment
Did you notice any deployment steps? Any Ant or Maven build? Did we create any JAR files to copy to remote nodes? As you probably guessed - the answer is no. We didn’t need to do any of these awkward and expensive steps because GridGain sports pretty unique technology that allows it deploy necessary classes on-demand in a distributed fashion completely transparently to the developer. In fact - you just write the code as if it is completely local and GridGain will take care of proper distribution, versioning, class loading, etc. |
Now, let’s advance our example. As you probably noticed we don’t see any evidence on remote nodes that any processing is happening there. In fact, by default GridGain starts in QUIET mode and most of the output is suppressed (if you need to start in normal mode - use -v flag for ggstart.sh script).
Let’s modify our example so that we’ll see what is being process and where:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import org.gridgain.grid.*;
import org.gridgain.grid.typedef.*;
import java.util.*;
import static org.gridgain.grid.GridClosureCallMode.*;
public class GridFunctionalMapReduceExample {
public static void main(final String[] args) throws GridException {
if (args.length == 1 && args[0].length() > 0)
GridFactory.in(new GridInClosureX<Grid>() {
@Override public void applyx(Grid g) throws GridException {
System.out.println("Length of input argument is " + g.reduce(
SPREAD,
new GridClosure<String, Integer>() {
@Override public Integer apply(String s) {
System.out.println("Calculating for: " + s);
return s.length();
}
},
//GridFunc.<String, Integer>cInvoke("length"),
Arrays.asList(args[0].split(" ")),
GridFunc.sumIntReducer()
));
}
});
}
}
|
We’ve commented out the reflection-based closure and added direct closure creation that prints out what string it is working on and returns its length. If we re-run our application we’ll now get the following output on three nodes:
Remote Node 1
[14:14:27] Node JOINED [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:27] Topology snapshot [nodes=3, CPUs=4, hash=0x89DEB0E3] Calculating for: is[14:14:31] Node LEFT [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:31] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]
Remote Node 2
[14:14:27] Node JOINED [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:27] Topology snapshot [nodes=3, CPUs=4, hash=0x89DEB0E3] Calculating for: awesome[14:14:31] Node LEFT [nodeId8=6e77be3c, addr=[192.168.1.103], CPUs=4] [14:14:31] Topology snapshot [nodes=2, CPUs=4, hash=0x71DE65CC]
Local node in IDE
[14:14:29] GridGain started OK [14:14:29] ^-- [grid=default, nodeId8=6e77be3c, order=1316034866664, CPUs=4, addrs=[192.168.1.103]] [14:14:29] ZZZzz zz z... Calculating for: GridGainLength of input argument is 16
[14:14:31] GridGain stopped OK [uptime=00:00:01:237]
| - That’s the output from our closure executing on remote nodes. | |
| - That’s the output from the reduction step executing on the local (initiating) node. |
Note that local node (i.e. the node running in IDE) performs calculation as well as performing the final reduction step. If we don’t want it to participate in the actual calculation and only perform the final reduction - we can simply change this line:
System.out.println("Length of input argument is " + g.reduce(
to this
System.out.println("Length of input argument is " + g.remoteProjection().reduce(
|
|
Consider this…
Look at these 20 lines of code and consider that this application includes:
|
Pretty neat, right? And so just in about two dozens lines of code and 10 minutes we’ve got our first MapReduce application running.
3.2. First GridGain Scala App
Now - let’s move to In-Memory Data Grid application and we’ll use Scala for that, more specifically - Scalar, our Scala-based DSL for GridGain.
|
|
Scalar DSL
The idea behind Scalar is to simply adopt Java-side APIs for usage in Scala. Scalar by design does not add any additional new functionality to GridGain but adopts Java APIs to Scala. This is a very important point to understand that there’s no additional or left out functionality when you are switching between Java and Scala - 100% of GridGain is available in both languages (nad natively so). Note that GridGain also comes with Grover - Groovy++ DSL. |
Since we are going to use In-Memory Data Grid in this example we need to restart our standalone nodes with enabled data grid. Note that by default there are no caches configured (for obvious performance reasons). GridGain comes with handy example configuration that comes with three caches configurated examples/config/spring-cache.xml.
You can stop existing nodes by simply Ctrl-C and then start them again using:
bin/ggstart.sh examples/config/spring-cache.xml
The output from the nodes is almost the same as in previous example with one notable change:
... [13:28:05] Configured caches ['partitioned', 'replicated', 'local'] ...
indicating that we now have three configured caches that are named based on their type.
Now that we have standalone nodes running with necessary configuration let’s turn to writing our application that will utilize the In-Memory Data Grid (we’ll use shorter term data grid going forward) side of GridGain. We are going to create an application that will populate data grid with a set of key-value pairs and then execute the set of closures where each closure will have an affinity with the specific key in the data grid - and therefore it will be co-located with the data for that key instead of just randomly be executed on some node in the grid.
|
|
Affinity Co-Location
Affinity co-location is extremely important use case in real-time processing as it underpins the system design that can scale linearly regardless of the size of the data set. |
Scalar-based (Scalar is GridGain’s DSL based on Scala) application looks pretty simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import org.gridgain.scalar.scalar
import scalar._
import org.gridgain.grid.cache.GridCache
object ScalarCacheAffinitySimpleExample {
/** Number of keys. */
private val KEY_CNT = 20
def main(args: Array[String]) {
scalar("examples/config/spring-cache.xml") {
val c = grid$.cache[Int, String]("partitioned")
populate(c) // Comment out on subsequent runs.
colocate(c)
}
}
private def populate(c: GridCache[Int, String]) {
(0 until KEY_CNT).foreach(i => c += (i -> i.toString))
}
private def colocate(c: GridCache[Int, String]) {
(0 until KEY_CNT).foreach(i =>
grid$.affinityRun("partitioned", i,
() => println("Co-located [key= " + i + ", value=" + c.peek(i) + ']'))
)
}
}
|
Even if you are not familiar with Scala - the code looks pretty self-explanatory. Let’s go line by line like we did for Java example above.
Line 1-3
Necessary imports including import for Scalar
Line 7
Defines number of keys we’ll be storing in the data grid and number of closures we’ll be executing later.
Line 10
Initializes Scalar with the same configuration file as we used for standalone nodes
examples/config/spring-cache.xml. Note that initializing Scalar essentially means starting up
the local node.
Line 11
We are getting an instance of the cache named partitioned (which is a partitioned cache named so for
clarity). Cache is typed for Int keys and String values.
Line 13, 14
Calls functions populate() and colocate() that are defined later.
Line 18-20
Function populate() simply puts key/value pairs into data grid. Node that partitioned cache will store
particular key/value pair on one of the nodes (potentially including the local one) as well as on one back up
node. Since we have three nodes in the topology (remember - one local node and two standalone noes) - each
key/value pair will be store on two nodes.
|
|
Running Multiple Time
Note that if you run this example multiple time you need to comment out line 13 since we don’t need to override value that are already in the data grid. Don’t get confused here: even though we are stopping the local node when our application finished and all data that was stored on this node will be lost - the key/value pairs are duplicated on backup nodes (i.e. stored twice in the data grid). When we start our application again the pre-loading process will optimally reshuffle the data from two existing nodes to new three nodes topology. Note that number of backup nodes and details of pre-loading process are fully configurable. |
Line 22-26
Function colocate() executes number of closures where each closure gets affinity co-located with some key
in the data grid. Note that the closure itself simply prints the trace message and uses function peek() that
gets the value only if it’s locally available - which should be since we are co-locating closure with the node
where data is stored (so called master node).
|
|
Affinity Co-Location
The colocate() function is the key functionality here. Look how simple it is to co-locate the computational logic (a closure) with the data this logic need to process (data in cache). |
Let’s go ahead and start our application. Starting Scala application is no different than starting Java application (at least if you use IDEs). Local node running in IDEA prints out the following log (abbreviated):
[22:57:01] _____ _ _______ _ ____ ____
[22:57:01] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[22:57:01] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[22:57:01] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:57:01]
[22:57:01] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:57:01] ver. x.x.x-DDMMYYYY
[22:57:01] Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:57:01]
[22:57:01] Quiet mode.
[22:57:01] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:57:01] << Enterprise Edition >>
...
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
...
[22:57:05] ZZZzz zz z...
Co-located [key= 0, value=0]
Co-located [key= 1, value=1]
Co-located [key= 7, value=7]
Co-located [key= 10, value=10]
Co-located [key= 18, value=18]
[22:57:09] GridGain stopped OK [uptime=00:00:03:171]
and the remote nodes print:
Remote Node 1
[22:56:40] _____ _ _______ _ ____ ____
[22:56:40] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[22:56:40] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[22:56:40] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:56:40]
[22:56:40] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:56:40] ver. x.x.x-DDMMYYYY
[22:56:40] Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:56:40]
[22:56:40] Quiet mode.
[22:56:40] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:56:40] << Enterprise Edition >>
...
[22:56:42] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
...
[22:57:02] Node JOINED [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
Co-located [key= 2, value=2]
Co-located [key= 3, value=3]
Co-located [key= 4, value=4]
Co-located [key= 5, value=5]
Co-located [key= 11, value=11]
Co-located [key= 12, value=12]
Co-located [key= 13, value=13]
Co-located [key= 16, value=16]
[22:57:08] Node LEFT [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:08] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
Remote Node 2
[22:56:40] _____ _ _______ _ ____ ____
[22:56:40] / ___/____(_)___/ / ___/___ _(_)___ |_ / / __/
[22:56:40] / (_ // __/ // _ / (_ // _ `/ // _ \ _/_ <_ /__ \
[22:56:40] \___//_/ /_/ \_,_/\___/ \_,_/_//_//_/ /____(_)____/
[22:56:40]
[22:56:40] ---==++ HIGH PERFORMANCE CLOUD COMPUTING ++==---
[22:56:40] ver. x.x.x-DDMMYYYY
[22:56:40] Copyright (C) 2005-2011 GridGain Systems, Inc.
[22:56:40]
[22:56:40] Quiet mode.
[22:56:40] ^-- To disable add -DGRIDGAIN_QUIET=false or "-v" to ggstart.{sh|bat}
[22:56:40] << Enterprise Edition >>
...
[22:56:39] Topology snapshot [nodes=1, CPUs=4, hash=0xF15A46A8]
...
[22:56:42] Node JOINED [nodeId8=de5c9bb0, addr=[127.0.0.1], CPUs=4]
[22:56:42] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
[22:57:02] Node JOINED [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:02] Topology snapshot [nodes=3, CPUs=4, hash=0xAB10A0C]
Co-located [key= 6, value=6]
Co-located [key= 8, value=8]
Co-located [key= 9, value=9]
Co-located [key= 14, value=14]
Co-located [key= 15, value=15]
Co-located [key= 17, value=17]
Co-located [key= 19, value=19]
[22:57:08] Node LEFT [nodeId8=3a890c51, addr=[127.0.0.1], CPUs=4]
[22:57:08] Topology snapshot [nodes=2, CPUs=4, hash=0x99D66AF5]
As you can see the key/value pairs got distributed roughly equal (the more keys the better the distribution will be obviously). What’s also important to note is that we didn’t get any null as values proving the fact that co-location work (remember: we’ve used function peek() that only return locally stored value or null if value for given key is not stored locally).
All in all - these were two quick examples of Java and Scala based applications that demonstrate some of the basics of GridGain functionality. The following chapters in the book will explain why these are just a scratch on the surface…
4. Getting and Installing
Now that we’ve looked briefly at what GridGain can do let’s start… from the beginning: how to get GridGain software and how to install it.
|
|
Screenshots Note that most of the website screenshots will change by the time you read this book. However, you can
easily navigate the website as it is right now as its main parts remained relatively the same. |
4.1. Download
There are three ways how you can get GridGain:
-
you can download it from http://www.gridgain.com,
-
or you can use Maven2 repository to get it,
-
or you can get Community Edition from GitHub
We highly recommend to use the first method and simply download ZIP file from http://www.gridgain.com website. To do so - simply open http://www.gridgain.com in your favorite browser and locate the download link that usually on the right side:
Once you clicked on the download link you’ll be on download page and you’ll need to enter your name and email:
Keep in mind several things:
-
There are two editions available for the download - enterprise and community
-
Community Edition is licensed under GPLv3 and Enterprise Edition comes with evaluation license
-
There is a link on top of the page for past downloads that contains selected previous releases of GridGain
-
In the download table (see above) you can see date of the build, its version, and the link to Release Notes
There are six downloads (as of version 3.0.2):
-
Enterprise and Community for Windows
-
Enterprise and Community for Linux/Unix/Mac OS.
-
Amazon AMI images for Enterprise and Community editions
All downloads are simple ZIP files. ZIP files are versioned and clearly named to indicate for what OS family they are intended to.
4.1.1. Maven2
Maven repository available only for version 3.0.0c-RC1 and up.
If you decide to use Maven please keep in mind:
-
Only Maven2 repository currently available (as of version 3.0.2)
-
Maven3 is not supported yet.
-
-
Only community edition is available in our public repository
-
Enterprise edition can only be downloaded directly from http://www.gridgain.com website
-
Maven2 POM file is included with distribution
-
-
Only the main GridGain JAR file is available in Maven repository
-
Depending on your usage of GridGain you may need configuration files, working directly, etc. that won’t be created when using Maven to get GridGain.
-
To utilize our Maven repository you’ll need to make the following changes. In your POM file you need to add dependency for GridGain:
<dependencies>
.
.
.
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>gridgain</artifactId>
<version>3.0.0c-rc1</version> <!-- CHANGE IT! -->
</dependency>
</dependencies>
Make sure to properly change the version of the GridGain.
You will need to add GridGain repository to your POM file as well:
<repositories>
.
.
.
<repository>
<id>gridgain</id>
<url>http://www.gridgainsystems.com/maven2/</url>
</repository>
</repositories>
Once you have it done - you are ready for Maven-based usage of GridGain.
|
|
Internal Repository We recommend to use internal Maven repositories for your projects if Maven is something you like
to use. You can download GridGain as usual through www.gridgain.com website and deploy necessary
files to your local repository for the rest of the team to use. This way you have full control
on how GridGain is available via Maven for your particular project. |
4.1.2. Versions
GridGain follows traditional rules on versioning and what specific version number means:
| Version | Description |
|---|---|
X.X.1…9 |
Point release. |
X.1…9.X |
Mid-point release. |
1…9.X.X |
Major release. |
In general, we target one major release every 12 months and mid-point release every 6 months. Point releases are being cut as we see need to patch issues or provide hot bug fixes.
4.1.3. Supported Operating Systems
GridGain is actively developed and tested on three major operating systems:
|
Mac OS X |
|
Windows 7 |
|
Linux (Ubuntu & Fedora) |
Being JVM-based software GridGain has minimal dependency on particular operating system (as long as Java is available for it). Most of the dependencies are in scripts. With every release of GridGain we thoroughly testing the software against the following version of operating systems:
-
Mac OS X 10.x
-
Linux Ubuntu (current active release)
-
Linux Fedora (current active release)
-
Windows XP/Vista/2007 (as of 3.0.2 version)
Note that we do not actively test against the following operating system but verified independently that GridGain 3.0 or later works stable and correct on them:
-
Solaris (current release)
-
HP-UX (current release)
-
Window 2003, Windows 2000
|
|
Less Tested GridGain is less tested on: Solaris, HP-UX, Window 2003, and Windows 2000. |
In general, with extremely rare exceptions, GridGain will work out-of-the-box on any Windows or Linux/Unix-based system as long as Java 6 (and Scala 2.9 or later) is available on it.
4.1.4. Java, Scala and Groovy
As of version 3.0 GridGain requires Java 6. Note that GridGain 3.0 has not been tested with upcoming Java 7 as of May 2011.
Starting with version 3.0.9 GridGain requires Scala 2.9 or later (if Scala is used which is optional). Note that original release of GridGain 3.0.0 came before Scala 2.8 GA was released and was compatible only with Scala 2.7.
Starting with version 3.1.1 GridGain requires Groovy 1.8 and corresponding version of Groovy++ (if Groovy is used which is optional).
Keep in mind that you can develop with either Java, Scala, Groovy or any combination of thereof. Specifically, Scala is not required to develop with GridGain but some of the tools, like GridGain Visor - monitoring and interpreting tool in Enterprise Edition, use Scala REPL and therefore Scala is required for its usage.
Note also that as of GridGain 3.0.2 - none of the functionality in community edition explicitly require Scala or Groovy.
|
|
Java, Scala and Groovy As of version 3.1.1 GridGain requires Java 6, Scala 2.9 and Groovy 1.8. |
As of November 2010 you can download both Java, Scala, and Groovy from:
-
Latest Java: http://www.oracle.com/technetwork/java/javase/downloads/index.html
-
Latest Scala: http://www.scala-lang.org/node/165
-
Latest Groovy: http://groovy.codehaus.org/Download
-
Latest Groovy++: http://code.google.com/p/groovypptest/downloads/listp
|
|
Java on Mac OS X Note that Java download for Mac OSX may change its location as its development is shifting from
Apple to Oracle as of November 2010. |
4.2. Installation
Once you download whatever ZIP file your have selected - the installation process is rather trivial:
-
Unzip it to any location you prefer.
-
Set up GRIDGAIN_HOME environment variable pointing to installation folder
Note that installation does not perform any new-line translations and text files may have wrong new-lines depending on what OS installation is performed.
Unix/Linux/Mac OSX ZIP file has all Shell scripts with executable flag set so that they can be called directly.
|
|
GRIDGAIN_HOME Note that strictly speaking GRIDGAIN_HOME is not required for GridGain operation - and if
you know that your setup won’t require it (explained later in the book) - you can skip it.
If you are new to GridGain - it’s very advisable to set GRIDGAIN_HOME right after the
unzipping the downloaded file. |
|
|
Trailing Spaces Make sure there is no trailing \ in GRIDGAIN_HOME path. |
4.2.1. Installing On Shared Location
One good practice for testing, staging or production setups is to install GridGain into shared location like a network share or shared hard drive. This way multiple grid nodes can share single configuration, libraries and working directory. This significantly simplifies management of GridGain installation in a distributed environment.
4.3. Uninstallation
Uninstalling GridGain is even simple than installing - you simply remove the GRIDGAIN_HOME folder where GridGain was installed. If it was configured to use paths outside of GRIDGAIN_HOME you will need to delete them too (if necessary).
4.4. Upgrading
Due to complexity of GridGain (mostly due to its distributed nature) we have decided not to provide incremental upgrade (or patching) capabilities. We recommend upgrading GridGain by cleanly uninstalling and installing a new upgrade version.
5. Configuration
5.1. Overview
GridConfigurationdoc interface defines grid runtime configuration. This configuration is passed to GridFactory.start(GridConfiguration)doc method. It defines all configuration parameters required to start a grid instance. Usually, a special class called "loader" will create an instance of this interface and call GridFactory.start(GridConfiguration)doc method to initialize GridGain instance.
Note, that absolutely every configuration property in GridConfigurationdoc is optional. You can simply create a new instance of GridConfigurationAdapterdoc, for example, and pass it to GridFactory.start(GridConfiguration)doc as is to start grid with default configuration. See GridFactorydoc documentation for information about default configuration properties used and more information on how to start grid.
The following configuration parameters can be used to configure grid node with GridConfigurationAdapter:
| Setter Method | Description | Optional | Default |
|---|---|---|---|
setGridName(String)doc |
Grid name. |
Yes |
null |
setGridGainHome(String)doc |
GridGain installation folder. |
Yes |
GRIDGAIN_HOME system property or environment variable. |
setLocalHost(String)doc |
System-wide local address or host for all GridGain components to bind to. |
Yes |
null |
setNodeId(UUID)doc |
Unique identifier for local node. |
Yes |
Random UUID. |
setNetworkTimeout(long)doc |
Maximum timeout in milliseconds for network requests. |
Yes |
5000ms |
setLicenseUrl(String)doc |
License URL different from the default location of the license file. |
Yes |
GRIDGAIN_HOME/gridgain-license.xml |
setUserAttributes(Map<String,? extends Serializable>)doc |
User specific attributes to attach to this node. Available via GridNode.getAttribute(String)doc method. Very useful for segmenting grid nodes into subgroups or identifying nodes based on certain property. |
Yes |
All System Properties and Environment Variables are set as node attributes automatically by GridGain. |
setDaemon(boolean)doc |
Daemon flag. |
Yes |
false |
setIncludeProperties(String…)doc |
Array of system or environment property names to include into node attributes. |
Yes |
All properties are included by default. |
setIncludeEventTypes(int…)doc |
Array of event types, which will be recorded by GridEventStorageManager. Note, that either the include event types or the exclude event types can be established. |
Yes |
All events are recorded by default. |
setExcludeEventTypes(int…)doc |
Array of event types, which will not be recorded by GridEventStorageManager. Note, that either the include event types or the exclude event types can be established. |
Yes |
All events are recorded by default. |
setLifecycleBeans(GridLifecycleBean…)doc |
Collection of lifecycle beans. |
Yes |
null |
setLifeCycleEmailNotification(boolean)doc |
Whether or not to enable lifecycle email notifications. |
Yes |
false |
setDiscoveryStartupDelay(long)doc |
Time in milliseconds after which a certain metric value is considered expired. |
Yes |
1 minute |
setGridLogger(GridLogger)doc |
Logger to use within grid. |
Yes |
GridLog4jLoggerdoc |
setMarshaller(GridMarshaller)doc |
Marshaller to use for serialization/deserialization of objects (available from ver. 2.1). |
Yes |
GridOptimizedMarshallerdoc |
setDeploymentMode(GridDeploymentMode)doc |
Deployment mode for task/query requests initiated from this node (available from ver. 2.1). |
Yes |
SHAREDdoc |
setPeerClassLoadingEnabled(boolean)doc |
Enables/disables peer class loading. |
Yes |
true |
setPeerClassLoadingMissedResourcesCacheSize(int)doc |
Specifies internal cache size for missed resources. If attempt to load a resource failed, then it will be cached, and following attempts will not make remote calls (available from ver. 2.1). |
Yes |
100 |
setP2PLocalClassPathExclude(List<String>)doc |
List of packages in a system class path that should be to P2P loaded even if they exist locally. |
Yes |
null |
setMetricsExpireTime(long)doc |
Time in milliseconds after which a certain metric value is considered expired. |
Yes |
600000ms |
setMetricsHistorySize(int)doc |
Number of metrics kept in history to compute totals and averages. |
Yes |
10000 |
setMetricsLogFrequency(int)doc |
Frequency of metrics log print out. |
Yes |
0, which means that metrics print out is disabled. |
setExecutorService(ExecutorService)doc |
Thread pool to use mainly for task and job execution. |
Yes |
GridThreadPoolExecutordoc with 100 threads. |
setExecutorServiceShutdown(boolean)doc |
Executor service shutdown flag. |
Yes |
true |
setSystemExecutorService(ExecutorService)doc |
Thread pool to use for processing job and task session asynchronous responses (available from ver. 2.1). |
Yes |
GridThreadPoolExecutordoc with 100 threads. |
setSystemExecutorServiceShutdown(boolean)doc |
System executor service shutdown flag. |
Yes |
true |
setPeerClassLoadingExecutorService(ExecutorService)doc |
Thread pool to use for processing peer class loading requests and responses (available from ver. 2.1). |
Yes |
GridThreadPoolExecutordoc with 20 threads. |
setPeerClassLoadingExecutorServiceShutdown(boolean)doc |
Peer class loading executor service shutdown flag. |
Yes |
true |
setMBeanServer(MBeanServer)doc |
MBean server for exposing GridGain MBeans. |
Yes |
The default MBean Server provided by JDK. |
setSegmentationPolicy(GridSegmentationPolicy)doc |
Segmentation policy. |
Yes |
STOPdoc |
setSegmentationResolvers(GridSegmentationResolver…)doc |
Segmentation resolvers. |
Yes |
null |
setSegmentCheckFrequency(int)doc |
Network segment check frequency. |
Yes |
10000ms |
setWaitForSegOnStart(boolean)doc |
Wait for segment on start flag. |
Yes |
true |
setAllSegmentationResolversPassRequired(boolean)doc |
All segmentation resolvers pass required flag. |
Yes |
true |
setRestEnabled(boolean)doc |
Flag indicating whether external REST access is enabled or not. |
Yes |
true |
setRestJettyPath(String)doc |
Path, either absolute or relative to GRIDGAIN_HOME, to JETTY XML configuration file. |
Yes |
null |
setRestSecretKey(String)doc |
Secret key to authenticate REST requests. |
Yes |
null, which means that authentication is disabled. |
setSmtpHost(String)doc |
SMTP host. |
Yes |
null, which disables sending emails. |
setSmtpPort(int)doc |
SMTP port. |
Yes |
25 |
setSmtpUsername(String)doc |
SMTP username. |
Yes |
null |
setSmtpPassword(String)doc |
SMTP password. |
Yes |
null |
setAdminEmails(String[]) |
Set of admin emails where email notifications will be set. |
Yes |
null |
setSmtpFromEmail(String)doc |
FROM email address for email notifications. |
Yes |
info@gridgain.com |
setSmtpSsl(boolean)doc |
Whether or not SMTP uses SSL. |
Yes |
false |
setSmtpStartTls(boolean)doc |
Whether or not SMTP uses STARTTLS. |
Yes |
false |
setLocalEventListeners(Map<GridLocalEventListener, int[]>)doc |
Pre-configured local event listeners. |
Yes |
null |
setLoadBalancingSpi(GridLoadBalancingSpi…)doc |
Fully configured instances of GridLoadBalancingSpi. Starting with GridGain 2.1 you can provide multiple instances of Load Balancing SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridRoundRobinLoadBalancingSpidoc |
setCheckpointSpi(GridCheckpointSpi…)doc |
Fully configured instances of GridCheckpointSpi. Starting with GridGain 2.1 you can provide multiple instances of Checkpoint SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridSharedFsCheckpointSpidoc |
setCollisionSpi(GridCollisionSpi)doc |
Fully configured instance of GridCollisionSpi. |
Yes |
GridFifoQueueCollisionSpidoc |
setCommunicationSpi(GridCommunicationSpi)doc |
Fully configured instance of GridCommunicationSpi. |
Yes |
GridTcpCommunicationSpidoc |
setDeploymentSpi(GridDeploymentSpi)doc |
Fully configured instance of GridDeploymentSpi. |
Yes |
GridLocalDeploymentSpidoc |
setDiscoverySpi(GridDiscoverySpi)doc |
Fully configured instance of GridDiscoverySpi. |
Yes |
GridMulticastDiscoverySpidoc |
setEventStorageSpi(GridEventStorageSpi)doc |
Fully configured instance of GridEventStorageSpi. |
Yes |
GridMemoryEventStorageSpidoc |
setFailoverSpi(GridFailoverSpi…)doc |
Fully configured instances of GridFailoverSpi. Starting with GridGain 2.1 you can provide multiple instances of Failover SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridAlwaysFailoverSpidoc |
setTopologySpi(GridTopologySpi…)doc |
Fully configured instances of GridTopologySpi. Starting with GridGain 2.1 you can provide multiple instances of Topology SPIs and then specify which one to use on per-task level via @GridTaskSpisdoc annotation attached to your GridTask implementation. |
Yes |
GridBasicTopologySpidoc |
setMetricsSpi(GridLocalMetricsSpi)doc |
Fully configured instance of GridLocalMetricsSpi. |
Yes |
GridJdkLocalMetricsSpidoc |
Some of the most commonly used configuration properties are explained in more detail below.
5.1.1. Grid Name
Use grid name configuration property whenever you would like to identify your grid by name. Usually, if you have only one grid node within your VM, you don’t have to configure grid name explicitly and use the default no-name grid node. However, if you start multiple grid node instances in the same VM, say for unit testing or debugging, then properly configuring grid name for every grid node instance will allow you to access multiple grid nodes by name via GridFactory.getGrid(String gridName)doc method.
5.1.2. User Attributes
User attributes allow you to attach various custom attributes to your nodes. This attributes can then be used to identify node topology for your task execution or load balancing, segmenting your grid into multiple sub-grids, etc… By default, GridGain will automatically attach or System and Environment properties to your node.
You can query node attributes practically from anywhere in your code, be that your task or job logic, or implementation of topology or load-balancing SPI’s. Simply get a handle on GridNode and check its attributes via GridNode.getAttribute(String)doc method.
5.1.3. Grid Logger
Configuring proper grid logger will allow you to integrate your logging with any environment. By default, GridLog4jLoggerdoc is used which gets its logging configuration from GRIDGAIN_HOME/config/default-log4j.xml.
Below is the list of supported loggers:
-
GridLog4jLoggerdoc - Log4j-based implementation for logging. This logger should be used by loaders that have prefer log4j-based logging. By default, GridGain will use this logger with configuration from GRIDGAIN_HOME/config/default-log4j.xml.
-
GridJavaLoggerdoc - Logger to use with Java logging. Implementation simply delegates to Java Logging.
-
GridJbossLoggerdoc - Logger to use in JBoss loaders. Implementation simply delegates to JBoss logging.
-
GridJclLoggerdoc - This logger wraps any JCL (Jakarta Commons Logging) loggers. Implementation simply delegates to underlying JCL logger. This logger should be used by loaders that have JCL-based internal logging (e.g., Websphere).
5.1.4. Grid Marshaller
Starting with GridGain 2.1 release you are able to configure different marshallers, and if needed provide your own. GridMarshallerdoc allows to marshal or unmarshal objects. It provides serialization/deserialization mechanism for all instances that are sent across network or are otherwise serialized.
GridGain provides the following GridMarshaller implementations:
-
GridJBossMarshallerdoc - this is the default marshaller used by GridGain. It used JBoss implementation of java.io.ObjectOutputStream for object serialization. All marshalled instances must implement java.io.Serializable.
-
GridJdkMarshallerdoc - this marshaller uses standard JDK java.io.ObjectOutputStream for object serialization.. All marshalled instances must implement java.io.Serializable.
-
GridXstreamMarshallerdoc - this marshaller uses Codehaus XStream for serialization of objects into XML. It does not require that marshalled instances implement java.io.Serializable, however, it performs slower than other marshaller implementations as XML is a verbose protocol.
-
GridOptimizedMarshallerdoc - Unlike GridJdkMarshaller, which is based on standard ObjectOutputStream, this marshaller does not enforce that all serialized objects implement java.io.Serializable. It is also generally much faster as it removes lots of serialization overhead that exists in default JDK implementation.
5.1.5. Executor Services
Starting with version 2.1, GridGain exposes configuration for 3 threads pools:
-
ExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for task and job executions. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 100 threads. Change this configuration parameter whenever you need to change the number of threads participating in GridTask/GridJob execution.
-
SystemExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for processing of asynchronous job and task session responses. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 100 threads. Change this configuration parameter whenever you set task session attributes frequently or feel that responses are not processed fast enough.
-
PeerClassLoadingExecutorServicedoc - Implementation of java.util.concurrent.ExecutorService to be used for processing of all Peer Class Loading requests. By default, standard ThreadPoolExecutor thread pool is provided and is configured to use 20 threads. Change this configuration parameter whenever you feel that class-loading requests don’t get processed fast enough.
|
|
Do not confuse executor services provided in configuration for thread pooling with grid-enabled executor service provided by GridGain. |
5.1.6. Grid Lifecycle Beans
See Grid Lifecycle Beans documentation for information on how to specify lifecycle beans and examples.
5.1.7. SPIs - Server Provider Interfaces
Server Provider Interfaces allow you to configure virtually every aspect of GridGain, such as communication, discovery, topology and failover, load-balancing, etc… in LEGO-like fashion. For information on available SPI’s and their configuration refer to SPI’s documentation.
5.2. Specifying Different SPIs Per GridTask
Starting with GridGain 2.1 you can start multiple instances of Topology SPI, Load Balancing SPI, Failover SPI and Checkpoint SPI. If you do that, you need to tell a task which SPI to use (by default it will use the first SPI in the list).
Add @GridTaskSpisdoc annotation for your task to specify what SPIs it wants to use. If this annotation is omitted, then by default GridGain will pick the first corresponding SPI implementation from the array of SPIs provided in configuration.
This example shows how to configure different SPI’s for different tasks. Let’s assume that you have two worker nodes, Node1 and Node2. Let’s also assume that you configure Node1 to belong to SegmentA and Node2 to belong to SegmentB. Here is a sample configuration for Node1:
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="userAttributes">
<map>
<entry key="segment" value="A"/>
</map>
</property>
</bean>
Node2 configuration looks similar to Node1 with segment attribute set to B:
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="userAttributes">
<map>
<entry key="segment" value="B"/>
</map>
</property>
</bean>
Now, if you have Task1 and Task2 starting from some master node NodeM, you can easily configure Task1 to only run on SegmentA and Task2 to only run on SegmentB. Here is how configuration on master node NodeM would look like:
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<!--
Topology SPIs. We have two named SPIs: One picks up nodes
that have attribute "segment" set to "A" and another one sees
nodes that have attribute "segment" set to "B".
-->
<property name="topologySpi">
<list>
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="name" value="topologyA"/>
<property name="filter">
<bean class="org.gridgain.grid.GridJexlNodeFilter">
<property name="expression" value="node.attributes['segment'] == 'A'"/>
</bean>
</property>
</bean>
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="name" value="topologyB"/>
<property name="filter">
<bean class="org.gridgain.grid.GridJexlNodeFilter">
<property name="expression" value="node.attributes['segment'] == 'B'"/>
</bean>
</property>
</bean>
</list>
</property>
</bean>
Then your Task1 and Task2 would look as follows (note the @GridTaskSpis annotation):
@GridTaskSpis(topologySpi="topologyA")
public class GridSegmentATask extends GridTaskSplitAdapter<String, Integer> {
...
}
and
@GridTaskSpis(topologySpi="topologyB")
public class GridSegmentBTask extends GridTaskSplitAdapter<String, Integer> {
...
}
5.3. GridSpringBean
Grid Spring bean allows to bypass GridFactorydoc methods. In other words, this bean class allows to inject new grid instance from Spring configuration file directly without invoking static GridFactorydoc methods. This class can be wired directly from Spring and can be referenced from within other Spring beans. By virtue of implementing org.springframework.beans.factory.DisposableBean and org.springframework.beans.factory.InitializingBean interfaces, GridSpringBean automatically starts and stops underlying grid instance.
The following configuration parameters are optional:
-
Grid configuration (see setConfiguration(GridConfiguration)doc)
5.3.1. Spring Configuration Example
<bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean" scope="singleton">
<property name="configuration">
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
<property name="gridName" value="mySpringGrid"/>
</bean>
</property>
</bean>
Or use default configuration:
<bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean" scope="singleton"/>
5.3.2. Java Example
Here is how you may access this bean from code:
AbstractApplicationContext ctx = new FileSystemXmlApplicationContext("/path/to/spring/file");
// Register Spring hook to destroy bean automatically.
ctx.registerShutdownHook();
Grid grid = (Grid)ctx.getBean("mySpringBean");
5.4. Examples
GridConfiguration may be defined in code:
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override default values for grid node.
cfg.setGridName("mygrid");
...
// Start grid.
GridFactory.start(cfg);
or from Spring configuration file (default Spring configuration file can be found in GRIDGAIN_HOME/config/default-spring.xml file):
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
...
<property name="gridName" value="mygrid"/>
...
</bean>
5.5. AOP Configuration
In order to use annotation based @Gridifydoc AOP-based grid-enabling the following AOP configuration needs to be in place depending on which AOP implementation you choose to use. Note that you only need to pick one AOP implementation.
5.5.1. JBoss AOP
Standalone application
Note that GridGain is not shipped with JBoss and doesn’t include necessary JBoss libraries. We assume that if you choose to use JBoss AOP you would have these libraries anyways. The following configuration needs to be applied to enable JBoss byte code weaving:
-
The following JVM configuration must be present (make sure to replace com.foo.bar with your domain package):
-
-javaagent:[path to jboss-aop-jdk50-4.x.x.jar]
-
-Djboss.aop.class.path=[path to gridgain.jar]
-
-Djboss.aop.exclude=org,com -Djboss.aop.include=com.foo.bar
-
-
The following JARs should be in a classpath:
-
javassist-4.x.x.jar
-
jboss-aop-jdk50-4.x.x.jar
-
jboss-aspect-library-jdk50-4.x.x.jar
-
jboss-common-4.x.x.jar
-
trove-1.0.x.jar
-
JBoss AOP with JBoss AS
-
Install JBoss AOP deployer.
-
Remove the jboss-aop-jdk50.deployer directory of "server/your_server_name/deploy" in your JBoss AS
-
Download the latest stable version of JBoss AOP (1.5.5 GA)
-
Unzip it and make sure that all directories were unzipped case-sensitive
-
Copy the appropriate jboss-aop-jdk50.deployer directory from your JBoss AOP installation to your "server/your_server_name/deploy"
-
Edit jboss-aop-jdk50.deployer/jboss-service.xml, setting "EnableLoadTimeWeaving" with a true value, like follows:
<attribute name="EnableLoadtimeWeaving">true</attribute> <attribute name="Exclude">java,javax,org,com,net,sun,oracle,EDU,antlr</attribute> <attribute name="Include">com.foo.bar</attribute>
Make sure to replace com.foo.bar with your domain package. Also make sure to edit the exclude list if it does not have some packages that you would not like to weave.
-
Follow the instructions of the jboss-aop-jdk50.deployer/ReadMe.txt file
-
Copy pluggable-instrumentor.jar file (located in the lib-50 directory of your JBoss AOP installation) to the bin directory of your server
-
Edit your run.sh or run.bat to include -javaagent:pluggable-instrumentor.jar in the JAVA_OPTS
-
-
Deploy Gridgain as SAR.
-
Copy gridgain.sar directory from GRIDGAIN_HOME/config/jboss folder into your "server/your_server_name/deploy" folder.
-
Make sure to update classpath in gridgain.sar/META-INF/jboss-service.xml to point to all libs under GRIDGAIN_HOME and GRIDGAIN_HOME/libs.
-
|
|
JBoss AOP and JSP JBoss AOP CFLOW pointcut does not properly work JSP-compiled classes (it does not properly handle JSP
classes on the stack). The workaround is to include pre-compiled JSP classes into your WAR file. Tomcat
provides instructions on how to do that with JSPC here -
Web Application Compilation. |
5.5.2. AspectJ AOP
The following configuration needs to be applied to enable AspectJ byte code weaving:
-
JVM configuration should include: -javaagent:[GRIDGAIN_HOME]/libs/aspectjweaver-1.5.3.jar
-
Classpath should contain the [GRIDGAIN_HOME]/config/aop/aspectj folder.
5.5.3. Spring AOP
Spring AOP framework is based on dynamic proxy implementation and doesn’t require any specific runtime parameters for online weaving. All weaving is on-demand and should be performed by calling method GridifySpringEnhancer.enhance(Object) for the object that has method with Gridify annotation.
Note that since this method of weaving requires manual enhancing of participating classes, it is rather inconvenient in most cases, and AspectJ or JbossAOP are recommended over it. Spring AOP can be used in situations when code augmentation is undesired and cannot be used. It also allows for very fine grained control of what gets weaved.
BEA Weblogic AS
Weblogic application server does not support AspectJ and JBoss AOP officially and the only way to use AOP is a Spring AOP. One needs to enhance classes as described above using Spring AOP. See http://springide.org/blog/2006/05/24/implementing-jee-with-spring-and-weblogic for details.
6. Main Abstractions
This chapter will list some of the main concepts in GridGain that you need to understand to move forward. Most of them will be discussed in greater depth later on in the book - but it’s helpful to lay them out upfront so that you can follow up examples. This also gives you the bird view on GridGain architecture and API design.
|
|
Keep in mind that we don’t expect you to fully understand each topic below just yet - all of them will be discussed in-depth much later in the book. In fact, you can skim this chapter quickly - but we recommend at least that. |
GridGain has several key abstractions that are essential for understanding pretty much everything else in GridGain. We’ll begin with them first.
6.1. GridNode Interface
GridNodedoc interface defines a logical grid node in the network topology. Note that a physical node (like a computer on the network) can have multiple logical grid nodes running on it. In fact, a single JVM can run multiple logical grid nodes - note that GridGain is the only software in the world allowing this unique capability.
GridNode interface has very concise API and deals only with a notion of a logical network endpoint, a node, in the topology: it has globally unique ID, node metrics, set of static attributes provided by the user and few other parameters.
|
|
GridNode GridNode has globally unique ID, set of static attributes provided by the user, node metrics and
few other parameters. |
The unique characteristic of GridGain is that it uses Peer-To-Peer (P2P) topology meaning that all nodes in GridGain are equal. There is no master or server nodes, and there are no worker or client nodes either. All nodes are equal from GridGain’s point of view - yet all these and any other roles can be assigned logically to the nodes.
This unique design gives GridGain tremendous flexibility: not only you are not limited to the master-worker mold - you can define any application specific roles and assign them to the nodes dynamically. More over, since these roles are logical, they can change and "migrate" from node to node as topology changes or based on your application logic.
GridNode interface is used primarily by internal kernal code and by discovery and communication SPI implementations and rarely used directly. GridRichNodedoc interface, its rich counterpart, is what used instead for majority of GridGain operations. More on this below.
6.2. Local Grid Node
Local grid node is an instance of GridNode interface that is instantiated in the local JVM runtime for a specific grid. In general, JVM process that runs GridGain runtime can have zero, one or more local grid nodes, but only one local node per specific grid’s topology.
6.3. Grid Topology
As the logical extension of the network endpoint - a grid node as defined above - we use the term topology throughout the GridGain documentation to reference a set of all logical grid nodes where each node "knows" every other nodes (in other words, topology is a fully connected graph of all grid nodes including the local node). We often refer to such topology as simply a grid.
|
|
Topology is Associative Note that the key characteristic of the topology is its associative property, i.e. the fact
that each node "knows" every other node. |
Depending on configured discovery SPI implementation (discussed later) the topology can be guaranteed to be consistent on all nodes at any point of time or be eventually consistent only. The eventual consistency means that all nodes will eventually get into fully consistent view but there a short time window where nodes can have a different view on the global topology. The guaranteed consistency is expensive to implement and it is optional in GridGain.
Note, however, that for data grid, for example, the configured SPI must provide consistent topology (i.e. support guaranteed discovery discussed later).
It is important to note that a single GridGain runtime can support any number of topologies or grids in the same time. Nodes from one topology have no knowledge about the nodes in other topologies. In such cases, single GridGain runtime (JVM process) will have several local grid nodes where each local node would belong to a different grid. Again, there is an important distinction between GridGain runtime (a JVM process) and a logical grid node running inside of that runtime.
|
|
GridGain Runtime vs. Grid Node There is an important distinction between GridGain runtime (a JVM process) and a logical grid
node running inside of that runtime. |
Note that we often refer to virtual sub-grid or sub-topology which is essentially just a subset of grid nodes from one topology. More on all that later.
6.4. GridProjection Interface
One of the major addition in GridGain 3.0 was introduction of a grid projection (and corresponding GridProjectiondoc interface. The important observation that led to this addition was the fact there is a large set of GridGain operations that can be defined uniformly on any arbitrary set of grid nodes.
Put it differently, each such operation can be performed on zero, one, two or any other number of grid nodes. For example, you can send a message to one, or more nodes in the grid. Just as you can listen for messages from one, two or any other number of nodes. And so on. To use functional programming terminology - these are the monadic operations defined on a set of grid nodes (which correspondingly makes GridProjection a monad).
|
|
GridProjection is a Monad GridProjection exposes monadic set of operations defined on an arbitrary non-empty set of
grid nodes. |
To logically group such operations the GridProjection interface is introduced and it defines all major operations in the GridGain that can be performed on a arbitrary set of nodes. You can think about grid projection as a specific view on topology. Projection can be static or dynamic and there are many ways how a projection can be defined.
As you will see throughout this book the elements of functional programming or functional API design are central to GridGain. You’ll discover that even when working with Java APIs you are dealing with functional constructs most of the time - even though Java is not a functional language to being with! This is one the unique sides of GridGain and it leads to extremely elegant and simple to use APIs.
We are going to cover projections and functional programming framework in Java in much greater details in subsequent chapters.
6.5. Rich Interfaces
Once we introduced grid projection it is pretty logical to extend definition of a grid node as a grid projection with just that one node it. Similarly, one can extend grid cloud definition as a grid projection that contains all nodes belonging to that specific physical cloud. And finally, it is only logical to provide a conveniently defined global projection that contains all the nodes in the topology.
This is exactly what GridRichNodedoc, and Griddoc interfaces do. They all extend GridProjection interface and add all necessary additional operations that are specific to a grid node, grid cloud or a global topology.
The idea of rich interfaces is central in Scala and Ruby libraries, for example. By having both thin and rich interfaces (GridNode and GridRichNode) we can satisfy both types of interface usages:
-
thin interface that needs to be implemented by the end user (and therefore should be as simple as possible), and
-
rich interface that is actually used by the end user (and therefore should be as rich as possible).
|
|
Rich vs. Thin Interfaces Thin interface that needs to be implemented by the end user and therefore should be as simple
as possible. Rich interface that is actually used by the end user and therefore should be as
rich as possible. |
Historically, Grid interface - being a global all-inclusive projection, has an additional special purpose. Grid interface acts as a main entry point for entire GridGain functionality. In fact, most of the operations you perform on GridGain originate on Grid interface. That is where you can obtain the instance of data grid, get an instance of rich cloud interface for a specific cloud, create and manage grid projections, manually deploy grid tasks and perform multitude of other operations.
To get an instance of Grid interface you need to use GridFactory.
6.6. GridFactory Class
GridFactorydoc class is a life-cycle factory for Grid instances. Its purpose is to provide various ways to start and stop instances of Grid interface. Note that Grid interface - being the main entry point for GridGain APIs - has a strict life-cycle and state machine that is controlled by GridFactory. As noted before, single GridGain runtime (JVM process) can have zero or more Grid instances each providing a local view on a different grid.
The usual way to work with GridGain is to use GridFactory class to start Grid instance using specific (or default) configuration file (usually at the beginning of your application). Starting Grid instance means, among other things, starting a local grid node and have it join the topology. Once you have started Grid instance you can use any APIs provided by GridGain. When GridGain is no longer needed, you use GridFactory to stop the Grid instance and its node will leave the topology (usually at the end of your application).
6.7. GridCache Interface
GridCachedoc interface represent the main API entry point for data grid functionality. GridCache instance always refers to a single named cache. You can configure as many named caches as you like. You receive the GridCache instance from Grid instance (as anything else in GridGain). GridCache is a rich interface and represents global cache projection (see below).
6.8. GridCacheProjection Interface
GridCacheProjectiondoc interface is analogous to GridProjection but it defines a cache projection over specified set of key-value pairs. In fact, GridCache interface extends GridCacheProjection and simply represents a global cache projection, i.e. the projection over all key-value pairs in this cache.
Cache projections are extremely powerful technique in GridGain’s data grid. It provides a monadic set of operations that is defined on any arbitrary set of:
-
keys,
-
values, or
-
key-value pairs
giving data grid a distinct functional flavor and providing consistent API design between compute and data grids.
|
|
Compute and In-Memory Data Grid Design Unification This logical and design unification between compute and data grids around functional monadic
concept is one of the unique characteristics of GridGain architecture. |
6.9. MapReduce
MapReduce is a relatively new name for very old concept. In a strict terms the term MapReduce refers to patented algorithm introduced by Google in their internal distributed data processing systems and closely mirrored by Hadoop project developed by a competing Yahoo!.
We (as well as distributed programming community in general) tend to use term MapReduce in more wider sense since it was extensively publicized and we refer to any divide-and-concur design strategies or traditional parallel computing as MapReduce. In fact, if you have a long running task, you can split this task into multiple sub-tasks, execute these tasks in parallel, aggregate their results back and get you final result in a fraction of time. That’s a classic parallel programming, or compute grid - and to avoid myriads of names we call it MapReduce too.
|
|
Google and MapReduce It is important, to note, however, that specific implementation that we chose in GridGain has
relatively little, if anything, to do with algorithm used by Google (or Hadoop). Not only
Google’s algorithm is patented, but it is also very specific to Google’s needs and rarely
applicable outside of extreme big-data use cases. |
Hadoop provides one implementation of MapReduce that is closely matching Google definition as an Apache project. Note that Google granted the license to Hadoop to use patented Google algorithm.
6.10. Streaming MapReduce
Streaming MapReduce is a less-defined term but often refers to a type of processing similar to MapReduce (i.e. tasks gets split and reduced) but with input data is not finite in general. Typical example is a search in a live video feed: the obvious problem is that you can’t load the entire feed first and then partition it into small parts to be processed in parallel (like you would do in a traditional MapReduce); you need to somehow map and reduce the incoming data as it comes and in the same time keep providing failover, topology resolution, collision resolution, back pressure control, and all other necessary services.
GridGain provide several unique ways of how this type of processing can be implemented.
6.11. Real-Time Processing
GridGain is all about processing large data sets (a.k.a. BigData) in real-time. But what real-time are we talking about?
In GridGain - we are talking about perceptual real-time, or a software real-time (Java Virtual Machine doesn’t technically support hardware real-time processing). Perceptual real-time is defined by a maximum response time that a typical user will wait for the task he or she expects to be "instant" before cancelling the task. For example, when a typical user clicks "Add to Basket" button on the website anything beyond couple of seconds will probably make that user to click "Back" or otherwise cancel the task. For a online trading application the delay of few seconds on submitting the order will indicate something wrong with the system. Portfolio management application that takes 10 seconds to open last minute moving average chart is practically unusable. And so on…
6.12. Closures & Predicates
We mention closures and predicates here only because we provide their full implementation on Java side. Unlike Scala or Groovy, where closures (functions) are part of the language, in Java they are not - and we had to develop an entire state of the art distributed functional framework for Java that is included with GridGain.
Closure is a block of code that encloses its body and any outside variables used inside of it as a function object. You can then pass such function object anywhere you can pass a variable and execute it. Predicate is a special type of closure that simply returns boolean value.
Scala as a hybrid OOP and FP language offers natural advantage over Java since it provides native in-language support for closures which enables much more concise and elegant APIs provided by GridGain’s Scalar DSL. However, with GridGain’s functional framework we brought Java functional usage as close to Scala as possible.
Below is a simple example that broadcasts and prints string on all nodes in the topology. Just compare how close Java, Groovy and Scala implementations are:
...
object Test {
// Broadcast "Howdy!" string to all nodes.
def main(args: Array[String]) = scalar {
grid$ *< (BROADCAST, () => println("Howdy!")
}
}
...
...
@Typed
@Use(GroverProjectionCategory)
class Test {
// Broadcast "Howdy!" string to all nodes.
static void main(String[] args) {
grover {
Grid g -> g.run(BROADCAST) { println("Howdy!") }
}
}
}
...
...
public class Test {
// Broadcast "Howdy!" string to all nodes.
public void main(String[] args) {
G.in(null, new CIX1() {
@Override public void applyx(Grid g) throws GridException {
g.run(BROADCAST, F.println("Howdy!"));
}
};
}
}
...
Closures and predicates used extensively in GridGain APIs and at the core of our design. You will discover plenty of examples later in the book of how closures and predicates - in Java, Groovy and Scala - used throughout the GridGain.
6.13. Type Aliases & Typedefs
One of the unusual features of our Java side APIs is introduction of type aliases (also known as typedefs) in GridGain 3.0. With introduction of functional design in GridGain 3.0 we have quickly discovered that Java APIs are just way too "chatty" due to lack of any type inference by the Java compiler. The only way to combat that problem in Java is to introduce type alias - a subclass with a shorter name.
Obviously, it works best (or at all) for static, factory-type classes but it also works for object instantiations but unfortunately you can’t use type aliases in method signature since they are different types. Such as live in a Java world…
|
|
Aliases Applicability Due to Java-based implementation type aliases or typedefs are used only for static,
factory-type classes. |
We’ve introduced aliases only for key GridGain types and there’s only about a dozen of aliases in GridGain APIs. It’s pretty easy to memorize the key ones that are used most frequently. Usage of aliases is, of course, optional as you can always use full (original) names of the types. But we are pretty sure that you’ll find aliases on Java side useful to make your code more readable.
Here’s good example demonstrating how aliases can Java code slightly more readable:
GridFunc.copy(res, goods,
GridFunc.<Item>and(
GridFunc.<Item>notNull(),
GridFunc.<Item>or(
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
GridFunc.forEach(res, GridFunc.<Item>println());
F.copy(res, goods,
F.<Item>and(
F.<Item>notNull(),
F.<Item>or(
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
F.forEach(res, F.<Item>println());
6.14. Grid Task and Job
Grid task and job are the main abstractions in the compute grid. As you recall the compute grid is about parallelization the processing, i.e. splitting a long running task into multiple sub-tasks, executing those sub-tasks in parallel and aggregating their results into one final result.
GridTaskdoc defines the descriptor for the overall task to be processed and grid job defines a sub-tasks, i.e. the piece of code that will travel to remote nodes for execution. Essentially a grid task defines mapping and reducing logic, while GridJobdoc is very similar to java.util.Callable interface by defining a simple executable body.
|
|
GridTask and GridJob GridTask defines overall task descriptor as well as mapping and reducing logic. GridJob
defines units of work that tasks get split into and that travel to remote nodes for execution. The
result of their execution will be reduced by the task into the final result. |
Note that anything that gets executed on the compute grid should be defined as a grid task. Closures and AOP-based executions get converted to grid task automatically when needed.
6.15. Service Provider Interface (SPI)
Service Provider Interface (SPI) is at core of GridGain architecture as it provide pluggable modularization to GridGain. SPI concept is simply a component interface with multiple pluggable implementations that all share unified life cycle management. There are two key benefits in this approach:
-
Rest of GridGain (including all user application code) only uses the SPI interface and is totally independent from its specific implementation.
-
User can provide its own pluggable implementation for SPIs and therefor not limited to the one provided by GridGain itself.
The example of SPI is communication subsystem. It has simple interface GridCommunicationSPIdoc and half a dozen implementations that GridGain provides out of the box. These implementations can be set in grid configuration GridConfiguration (and TCP/IP-based implementation is always set by default so that you don’t have to set anything to get GridGain working right away).
|
|
GridGain SPI - Service Provider Implementation Experienced reader can quickly notice that SPI is very similar to OSGI. Although we looked
carefully at OSGI number of years ago we quickly concluded that we needed more custom
functionality that OSGI provided at the time. In the same time our SPI architecture and OSGI
share the same goals of componentization and modularity. |
GridGain is "sliced" into dozen of different SPIs for all major subsystems and each such subsystem can be fully replaced by user’s specific implementation - an extremely powerful feature of GridGain.
7. Functional Programming Framework
Introduction of Functional Programming (FP) into Java-based GridGain has its own unique story in GridGain that is worth repeating here.
It was anything but a straightforward decision… In early 2008 when we released GridGain 2.0 we’ve started looking for a new ways to simplify the usability of GridGain. We’ve had a pretty good story with AOP-based grid enabling and a direct API support for MapReduce type of processing - in fact we’ve been way ahead of everyone else in this department. But we felt there’s still lots of plumbing exposed for many use cases where such exposition was clearly unnecessary.
The obvious thought for us was to look at Domain Specific Language (DSL) route. We quickly realized that Java-based DSL is simply out of question (it would be just another set of Java APIs not much different from we had already). XML-based DSL (or any type of external DSL) was considered a non-started even in a hay days of DSLs of 2006-2007.
So, we started looking at other JVM-based languages that would be much more appropriate for DSL development yet let us reuse GridGain Java-based APIs. During surprisingly short evaluation period (which we’ll chronicle in Scala section later on) we quickly and decisively settled on Scala - relatively new than language that so powerfully combined OOP and FP into one cohesive and expressive language.
As we dived into Scala-based DSL development with a renewed energy we quickly realized that in order to provide truly powerful DSL in Scala (utilizing Scala’s functional core including partial functions, closures, etc.) we essentially had to re-implement most of the main APIs in Scala, i.e. duplicate GridGain in Scala. That was a pretty rude awakening for us as it was simply a no-go to have GridGain implemented in two parallel tracks: one in Java and one in Scala.
And that’s where functional story for Java begins. After some research on our part we noticed that if we could make Java side APIs functional in their design - we could largely reduce the Scala-based DSL to a collection of implicit conversions from functional Java parts to Scala parts (and back). That would also allow to have all implementation in Java (where it is originally was) and keep Scala APIs as a layer on top without any duplication of code what-so-ever.
We’ve set off to develop a first truly distributed Functional Programming framework for Java.
After a few false starts we’ve got a first working prototype (that didn’t suck) and started the refactoring process of our Java APIs into functional mold. What we started noticing along the way is that new APIs (newly added or refactored) were becoming much more powerful and elegant when used with functional constructs such as predicate-based node filters or closure-based executions. In about that time we also came up with grid and cache projections that truly revolutionized the GridGain usability. Many implementations became shorter and yet more expressive when we started using our GridFunc class as well as newly introduced typedefs. All these positive effects on our internal software development solidified our resolve to provide the same capabilities to the users of GridGain.
|
|
Scala Leads to FP in Java
All in all, the introduction of Scala support and Scala DSL into GridGain led us to develop one of the most comprehensive Functional Programming frameworks in Java that at the core of most of the functionality in GridGain. |
What is even more interesting (and exciting for us) is that FP focus on Java side made the development of Grover, our Groovy++ based DSL for GridGain, simple if not downright trivial. It took just a week to release first beta version of it and it was already mighty useful for large Groovy community.
More on that later…
7.1. Type Aliases and Typedefs
Type aliases or typedefs used in programming languages to give shorter name to existing type name to make the code that is using it more readable and easer to understand.
Many languages provide built-in support for type aliases or typedefs.
C-based languages (C/C++/Objective-C) provide direct support for it:
typedef enum {A, B, C} myEnum;
typedef int myInt, yoursInt;
Scala also provides excellent built-in support for type alias that goes beyond C-based capabilities. You can declare a type alias right during importing the original type:
import foo.bar.{Original => O} // 'O' becomes alias of 'Original'
And you can also declare the type alias in your code much the same as declaring method or a field (and similar to C/C++):
type Call[R] = () => R // A shorthand for function
type OneWayTicket = () => Unit // Another shorthand for function
You often use structural types in Scala with type aliases:
type T = { def foo: Unit }
declares T as an alias for any type that has method foo with return type Unit.
Java, unfortunately, does not provide any support for type aliases… Yet Java needs them more than any language above due to its syntactic bloat and lack of any reasonable type inference. In fact, look at this "typical" Java code:
private HashMap<Collection<String>, Set<Integer>> map =
new HashMap<Collection<String>, Set<Integer>>();
Since there is no type inference you have to repeat bloated HashMap definition twice in this definition for no reason what-so-ever - it only makes code look more busy and less readable. And if this type of HashMap is used frequently - and there’s no way to shorthand it - you have to repeat this over and over again in every place where it is used.
|
|
Java needs typedefs more than any language above due to its syntactic bloat and lack of any reasonable type inference. |
Surprisingly enough, this shortcoming of Java is often sighted as one of the major reason the Java code "feels" bloated and unnecessary verbose. This gets even more pronounced when you start using Scala that, like Java, is fully statically typed but removes most, if not all, bloat and unnecessary repetition from the code.
With introduction of Functional Programming in GridGain 3.0 (explained in the following chapter) we were faced with this very problem in Java APIs: we had plenty of new interfaces and classes that were used literally everywhere in our code and it was becoming unwieldy in many places since many of them require parametrization and code was becoming simply ridiculous in some place… Needless to say that users of GridGain APIs would have been faced with the same problems.
To solve this problem (somewhat) we have introduced typedefs to our Java APIs (Scala APIs naturally use Scala language type aliases).
|
|
Typedef Essentially, a typedef is simply a subclass with a shorter name - as there is no other
way to do that in Java. |
In package org.gridgain.grid.typedefdoc we have few dozens of typedefs defined as sub-classes with short one-two letter names for various frequently used types in GridGain. The following table shows all typedefs shipped with GridGain:
| Typedef or Type Alias | Original Type |
|---|---|
C1<E1,R>doc |
org.gridgain.grid.lang.GridClosure<E1,R>doc |
C2<E1,E2,R>doc |
org.gridgain.grid.lang.GridClosure2<E1,E2,R>doc |
C3<E1,E2,E3,R>doc |
org.gridgain.grid.lang.GridClosure3<E1,E2,E3,R>doc |
CAdoc |
org.gridgain.grid.lang.GridAbsClosuredoc |
CAXdoc |
org.gridgain.grid.lang.GridAbsClosureXdoc |
CI1<T>doc |
org.gridgain.grid.lang.GridInClosure<T>doc |
CI2<E1,E2>doc |
org.gridgain.grid.lang.GridInClosure2<E1,E2>doc |
CI3<E1,E2,E3>doc |
org.gridgain.grid.lang.GridInClosure3<E1,E2,E3>doc |
CIX1<T>doc |
org.gridgain.grid.lang.GridInClosureX<T>doc |
CIX2<E1,E2>doc |
org.gridgain.grid.lang.GridInClosure2X<E1,E2>doc |
CIX3<E1,E2,E3>doc |
org.gridgain.grid.lang.GridInClosure3X<E1,E2,E3>doc |
CO<T>doc |
org.gridgain.grid.lang.GridOutClosure<T>doc |
COX<T>doc |
org.gridgain.grid.lang.GridOutClosureX<T>doc |
CX1<E1,R>doc |
org.gridgain.grid.lang.GridClosureX<E1,R>doc |
CX2<E1,E2,R>doc |
org.gridgain.grid.lang.GridClosure2X<E1,E2,R>doc |
CX3<E1,E2,E3,R>doc |
org.gridgain.grid.lang.GridClosure3X<E1,E2,E3,R>doc |
Fdoc |
org.gridgain.grid.lang.GridLangdoc |
Gdoc |
org.gridgain.grid.GridFactorydoc |
P1<E1>doc |
org.gridgain.grid.lang.GridPredicate<E1>doc |
P2<T1,T2>doc |
org.gridgain.grid.lang.GridPredicate2<T1,T2>doc |
P3<T1,T2,T3>doc |
org.gridgain.grid.lang.GridPredicate3<T1,T2,T3>doc |
PAdoc |
org.gridgain.grid.lang.GridAbsPredicatedoc |
PCE<K,V>doc |
org.gridgain.grid.lang.GridPredicate<GridCacheEntry<K, V>> |
PEdoc |
org.gridgain.grid.lang.GridPredicate<GridEvent> |
PKV<K,V>doc |
org.gridgain.grid.lang.GridPredicate2<K, V> |
PNdoc |
org.gridgain.grid.lang.GridPredicate<GridRichNode> |
PX1<E1>doc |
org.gridgain.grid.lang.GridPredicateX<E1>doc |
PX2<T1,T2>doc |
org.gridgain.grid.lang.GridPredicate2Xdoc |
PX3<T1,T2,T3>doc |
org.gridgain.grid.lang.GridPredicate3Xdoc |
R1<E1,R>doc |
org.gridgain.grid.lang.GridReducerdoc |
R2<E1,E2,R>doc |
org.gridgain.grid.lang.GridReducer2doc |
R3<E1,E2,E3,R>doc |
org.gridgain.grid.lang.GridReducer3doc |
RX1<E1,R>doc |
org.gridgain.grid.lang.GridReducerXdoc |
RX2<E1,E2,R>doc |
org.gridgain.grid.lang.GridReducer2Xdoc |
RX3<E1,E2,E3,Rdoc |
org.gridgain.grid.lang.GridReducer3Xdoc |
As you can see typedefs defined primarily for functional classes (tuples, closures, and predicates) as well as for a few factory classes like GridFactorydoc and GridFuncdoc. Here’s a short sub-list of the most frequently used typedefs in GridGain:
| Typedef or Type Alias | Original Type |
|---|---|
C1<E1,R>doc |
org.gridgain.grid.lang.GridClosure<E1,R>doc |
CAdoc |
org.gridgain.grid.lang.GridAbsClosuredoc |
CO<T>doc |
org.gridgain.grid.lang.GridOutClosure<T>doc |
Fdoc |
org.gridgain.grid.lang.GridLangdoc |
P1<E1>doc |
org.gridgain.grid.lang.GridPredicate<E1>doc |
PAdoc |
org.gridgain.grid.lang.GridAbsPredicatedoc |
PNdoc |
org.gridgain.grid.lang.GridPredicate<GridRichNode> |
Here’s a code snipped from GridFunctionalCopyExample example that is shipped with GridGain. First version does not use typedefs and uses full names of the types:
GridFunc.copy(res, goods,
GridFunc.<Item>and(
GridFunc.<Item>notNull(),
GridFunc.<Item>or(
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new GridPredicate<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
GridFunc.forEach(res, GridFunc.<Item>println());
F.copy(res, goods,
F.<Item>and(
F.<Item>notNull(),
F.<Item>or(
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.novelty;
}
},
new P1<Item>() {
@Override public boolean apply(Item item) {
return item.price < 150;
}
}
)
)
);
F.forEach(res, F.<Item>println());
I would argue that the second version gains more readability and easier to understand since we don’t have to repeat ad nauseum GridFunc and GridPredicate in every line.
Note also that we have couple of typedefs that shorten parameterized types that allows for greater brevity and more concise code:
| Typedef or Type Alias | Original Type |
|---|---|
PCE<K,V>doc |
org.gridgain.grid.lang.GridPredicate<GridCacheEntry<K, V>> |
PEdoc |
org.gridgain.grid.lang.GridPredicate<GridEvent> |
PKV<K,V>doc |
org.gridgain.grid.lang.GridPredicate2<K, V> |
PNdoc |
org.gridgain.grid.lang.GridPredicate<GridRichNode> |
|
|
Limitation
Now, this approach obviously has limitation:
|
|
|
Scala You can freely use Java-based typedefs in Scala code - but we suggest to use native
type alias support provided by Scala |
7.1.1. Typedefs vs. Factory Methods
Now, you may ask why not use factory methods, a standard Java idiom, instead? GridGain actually provides plenty of factory methods in GridFuncdoc class (that itself has F typedef). But factory methods often tend to be more verbose and sometime hide the "creation of new instance" context.
Foobar v = FoobarFactory.newFoobar(...);
or
Foobar v = new T(...); // 'T' is a typedef for 'Foobar'
In our experience working with GridGain source code we’ve found that typedefs generally provide for the most terse code without loosing context or readability.
7.1.2. Where To Use Typedefs
The answer here is simple - everywhere unless you lose in readability of your code.
|
|
Do Not Trade In Readability We strongly believe that you should never trade in few saved characters for poorer code readability. |
Once you get familiar with the some of the most frequently used typedefs - you will start using them freely and in most situation they will improve your code - make it less bloated and concentrate reader’s attention on the actual business logic and away from unnecessary repetitive declarations.
8. GridGain Basics
In this pretty long chapter we’ll cover all basic functionality available in GridGain apart from the "big two" - compute grids and data grids - which will be covered in subsequent individual chapters. Both big subsystems are fundamentally based on the functionality explained in this chapter and therefore the following material is pretty important.
|
|
What’s more interesting is that fact that some GridGain-based applications don’t even use any of the two main technologies we have - but utilize, for example, actor-based message passing, distributed functional programming, zero deployment or event-based processing provided by GridGain. |
8.1. Logging
GridGain provides pluggable logging capability by allowing the user to specify his own logging framework. This is especially convenient when GridGain runs inside of the hosting environment such as servlet container or application server.
In such case, GridGain can be easily configured to route all its logging through host’s logging framework eliminating the need to have multiple log file locations. This also dramatically simplifies the debugging since multiple log file don’t have to be line-by-line synchronized.
To provide this pluggability GridGain relies on its own interface GridLoggerdoc that provides absolute minimal API for logging. While GridGain uses this interface throughout entire product it further provides out-of-the-box complete implementations for this interface using the following popular logging frameworks:
Users, of course, are free to provide their own implementations and many often do for integration with existing log analysis or health monitoring solutions.
8.1.1. Configuration
GridGain logger could be configured either from code by modifying GridConfiguration during start of GridGain or via Spring XML. Following examples demonstrate both ways for Log4J and JCL loggers:
GridConfiguration cfg = new GridConfigurationAdapter();
...
// Log4J logger.
URL xml = U.resolveGridGainUrl("modules/tests/config/log4j-test.xml");
GridLogger log = new GridLog4jLogger(xml);
...
cfg.setGridLogger(log);
...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.jcl.GridJclLogger">
<constructor-arg type="org.apache.commons.logging.Log">
<bean class="org.apache.commons.logging.impl.Log4JLogger">
<constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
</bean>
</constructor-arg>
</bean>
</property>
...
Configuring Java Logging
Here is an example of configuring Java logger in GridGain configuration Spring file to work over Log4J implementation. Note that we use the same configuration file as we provide by default:
...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.java.GridJavaLogger">
<constructor-arg type="java.util.logging.Logger">
<bean class="java.util.logging.Logger">
<constructor-arg type="java.lang.String" value="global"/>
</bean>
</constructor-arg>
</bean>
</property>
...
or
...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.java.GridJavaLogger"/>
</property>
...
And the same configuration if you’d like to configure GridGain in your code:
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJavaLogger(Logger.global);
...
cfg.setGridLogger(log);
or which is actually the same:
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJavaLogger();
...
cfg.setGridLogger(log);
Configuring Log4j
Here is a typical example of configuring log4j logger in GridGain configuration file:
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.log4j.GridLog4jLogger">
<constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
</bean>
</property>
and from your code:
GridConfiguration cfg = new GridConfigurationAdapter();
...
URL xml = U.resolveGridGainUrl("modules/tests/config/log4j-test.xml");
GridLogger log = new GridLog4jLogger(xml);
...
cfg.setGridLogger(log);
Configuring JBoss logging
Information about configuring JBoss logging with GridGain can be found at http://docs.jboss.org/process-guide/en/html/logging.html.
Configuring Tomcat logging
Please refer to http://tomcat.apache.org/tomcat-6.0-doc/logging.html for more information on how to configure GridGain with Tomcat logging.
Configuring JCL
This logger wraps any JCL - Jakarta Commons Logging loggers. Implementation simply delegates to underlying JCL logger. This logger should be used by loaders that have JCL-based internal logging (e.g., Websphere).
Here is an example of configuring JCL logger in GridGain configuration Spring file to work over Log4J implementation. Note that we use the same configuration file as we provide by default:
...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.jcl.GridJclLogger">
<constructor-arg type="org.apache.commons.logging.Log">
<bean class="org.apache.commons.logging.impl.Log4JLogger">
<constructor-arg type="java.lang.String" value="config/default-log4j.xml"/>
</bean>
</constructor-arg>
</bean>
</property>
...
If you are using system properties to configure JCL logger use following configuration:
...
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.jcl.GridJclLogger"/>
</property>
...
And the same configuration if you’d like to configure GridGain in your code:
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJclLogger(new Log4JLogger("config/default-log4j.xml"));
...
cfg.setGridLogger(log);
or following for the configuration by means of system properties:
GridConfiguration cfg = new GridConfigurationAdapter();
...
GridLogger log = new GridJclLogger();
...
cfg.setGridLogger(log);
Configuring SLF4J
This logger should be used by hosts that have slf4j-based logging.
Here is an example of configuring SLF4J logger in GridGain configuration Spring file:
<property name="gridLogger">
<bean class="org.gridgain.grid.logger.slf4j.GridSlf4jLogger"/>
</property>
8.1.2. Injection vs. Instantiation
Instance of GridLogger interface can be obtain at any point via Grid.log() method. However, when logger is needed in grid task and/or jobs it is preferable to use resource injection via @GridLoggerResourcedoc annotation that annotates a field or a setter method for injection of GridLogger instance.
Logger can be injected into instances of following classes:
-
GridTask
-
GridJob
-
GridSpi (and its all implementations)
-
GridLifecycleBean
-
Any object annotated with @GridUserResource annotation
Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridLoggerResource
private GridLogger log;
...
}
or
public class MyGridJob implements GridJob {
...
private GridLogger log;
...
@GridLoggerResource
public void setGridLogger(GridLogger log) {
this.log = log;
}
...
}
8.1.3. Quiet Mode
GridGain 3.0 introduced a quiet logging mode. Essentially, this mode suppresses most of the INFO and all DEBUG logging and provides very concise logging output. This mode is very useful for examples and demonstration as well as for everyday development where full output of INFO or DEBUG is not necessary.
By default starting with version 3.0 GridGain starts in quite mode suppressing INFO and DEBUG log output. If system property GRIDGAIN_QUIET is set to false than GridGain will operate in normal un-suppressed logging mode (with whatever logging back-end is configured). Note that all output in quiet mode is done through standard output (STDOUT).
Note that GridGain’s standard startup scripts $GRIDGAIN_HOME/bin/ggstart.{sh|bat} starts by default in quiet mode. Both scripts accept -v arguments to turn off quiet mode.
8.2. Loaders
Grid loaders are used to start grid in different environments. Loaders provide basic boilerplate code for starting GridGain in various environments. For example, when starting within application servers such as JBoss, Weblogic or Websphere, provided loaders will configure GridGain to use "native" logging, JMX facility, discovery and execution services (JSR-237) which makes GridGain basically blend into hosting environment. Loaders do not need to implement any interface and their sole responsibility is to configure and start grid.
8.2.1. Command Line Loader
Command line loader located in org.gridgain.grid.loaders.cmdline package.
Command line loader is used to start grid from a command line script. Grid comes with ggstart.{sh|bat} startup script located in $GRIDGAIN_HOME/bin folder. By default this script will use configuration defined in $GRIDGAIN_HOME/config/default-spring.xml. This configuration will pick default configuration for all grid internal components and SPI’s which is sufficient for running examples and doing your own development and testing.
If you wish to provide your own configuration file, simply pass its path as parameter to the script.
ggstart.sh C:\myfolder\mygrid.xml
To stop grid, simply press CTRL-C which will initiate GridGain stop routine.
|
|
Script Startup Note that in addition to starting grid nodes on separate physical machines, GridGain supports
starting multiple grid nodes on the same machine as well as in the same VM. The only requirement
for default configuration is that IP-Multicast is supported. |
|
|
Custom Jars Starting from 2.1.0 you can add your libraries to the class path without changing startup scripts.
Just put them into the $GRIDGAIN_HOME/libs/ext directory and GridGain will pick them up automatically. |
|
|
GRIDGAIN_HOME Environment Variable If you get the following error: Exception in thread "main" java.lang.NoClassDefFoundError:
org/gridgain/grid/loaders/cmdline/GridCommandLineLoader, then your $GRIDGAIN_HOME environment variable
is not set or is set incorrectly. Please set $GRIDGAIN_HOME environment variable to your GridGain
installation folder. |
8.2.2. GlassFish Loader
GlassFish loader located in org.gridgain.grid.loaders.glassfish package.
GlassFish loader is used to start GridGain within GlassFish application server. GridGain loader implemented as GlassFish life-cycle listener module. GlassFish loader should be used to provide tight integration between GridGain and GlassFish AS. Current loader implementation works on both GlassFish v1 and GlassFish v2 servers.
The following steps should be taken to configure this loader:
-
Add GridGain libraries in GlassFish common loader. See GlassFish Class Loaders
-
Create life-cycle listener module. Use command line or administration GUI:
asadmin> create-lifecycle-module --user admin --passwordfile ../adminpassword.txt --classname "org.gridgain.grid.loaders.glassfish.GridGlassfishLoader" --property cfgFilePath="config/default-spring.xml" GridGain
For more information consult GlassFish Project - Documentation Home Page.
Note that GlassFish is not shipped with GridGain. If you don’t have GlassFish, you need to download it separately. See https://glassfish.dev.java.net for more information.
8.2.3. Tomcat Loader
Tomcat loader located in org.gridgain.grid.loaders.tomcat package.
Tomcat loader is used to start GridGain within Tomcat server. GridGain loader implemented as Tomcat LifecycleListener. Tomcat loader should be used to provide tight integration between GridGain and Tomcat web container (logging, MBean server).
The following steps should be taken to configure this loader:
-
Add GridGain libraries in Tomcat common loader. Add in file $TOMCAT_HOME/conf/catalina.properties for property common.loader the following $GRIDGAIN_HOME/gridgain.jar,$GRIDGAIN_HOME/libs/*.jar (replace $GRIDGAIN_HOME with absolute path).
-
Add GridGain LifeCycle Listener in $TOMCAT_HOME/conf/server.xml.
<!-- GridGain loader -->
<Listener className="org.gridgain.grid.loaders.tomcat.GridTomcatLoader"
configurationFile="config/default-spring.xml"/>
Note that Tomcat is not shipped with GridGain. If you don’t have Tomcat, you need to download it separately. See http://tomcat.apache.org for more information.
8.2.4. JBoss Loader
JBoss loader located in org.gridgain.grid.loader.jboss package.
JBoss loader is used to start GridGain within JBoss as a JBoss service. Note that jboss-service.xml has a configuration parameter pointing to Spring XML configuration. At startup, JBoss loader will look for the Spring configuration XML file specified in jboss-service.xml.
GridGain ships with pre-built SAR directory. SAR directory located in $GRIDGAIN_HOME/config/jboss folder. You can simply deploy GridGain into JBoss into 2 steps:
-
Change the codebase in jboss-service.xml in META-INF sub-folder to point to correct location.
-
Copy entire SAR directory from $GRIDGAIN_HOME/config/jboss folder to deploy directory of the JBoss.
Here is how $GRIDGAIN_HOME/config/jboss/jboss-service.xml looks (note we use 1.5.0 version as an example):
<!DOCTYPE server PUBLIC "-//JBoss//DTD MBean Service 4.0//EN" "http://www.jboss.org/j2ee/dtd/jboss-service_4_0.dtd">
<!--
JBoss service descriptor for GridGain JBoss Loader.
Classpath should contain the following libraries:
- $GRIDGAIN_HOME/libs/*.jar
- $GRIDGAIN_HOME/gridgain_1.5.0.jar
For example, if GridGain is installed into /opt/gridgain-1.5.0 then
you can use the following classpath settings to includes all
necessary JARs:
<classpath codebase="/opt/gridgain-1.5.0/gridgain-1.5.0.jar"/>
<classpath codebase="/opt/gridgain-1.5.0/libs" archives="*"/>
-->
<server>
<classpath codebase=".." /> <!-- FIX IT BEFORE USING. -->
<mbean code="org.gridgain.grid.loaders.jboss.GridJbossLoader" name="gridgain:service=loader">
<!--
config/default-spring.xml - Default GridGain configuration.
config/jboss/ha/jboss-gridgain-ha-spring.xml - JBoss specific configuration that
will use JBoss SPIs for communication and discovery. Requires JBoss HA enabled.
-->
<attribute name="ConfigurationFile">config/default-spring.xml</attribute>
</mbean>
</server>
|
|
Currently provided JBoss loader doesn’t work with JBoss 7. Use servlet context listener loader instead. |
8.2.5. WebLogic Loader
Weblogic loader located in org.gridgain.grid.loader.weblogic package.
Weblogic loader is used to start GridGain within Weblogic application server. GridGain loader for WebLogic implemented as a pair of start and shutdown classes. Please consult WebLogic documentation on how to configure startup classes in Weblogic. Weblogic loader is used for tight integration with Weblogic AS. Specifically, Weblogic loader integrates GridGain with Weblogic logging, MBean server, and work manager (JSR-237).
The following steps should be taken to configure startup and shutdown classes:
-
Add Startup and Shutdown Class in admin console (Environment → Startup & Shutdown Classes → New).
-
Add the following parameters for startup class:
-
Name: GridWeblogicStartup
-
Classname: org.gridgain.grid.loaders.weblogic.GridWeblogicStartup
-
Arguments: cfgFilePath=config/default-spring.xml
-
-
Add the following parameters for shutdown class:
-
Name: GridWeblogicShutdown
-
Classname: org.gridgain.grid.loaders.weblogic.GridWeblogicShutdown
-
-
Change classpath for WebLogic server in startup script: CLASSPATH="$CLASSPATH:$GRIDGAIN_HOME/gridgain.jar:$GRIDGAIN_HOME/libs/"
For more information on Weblogic start/shutdown classes see http://edocs.bea.com/wls/docs100/ConsoleHelp/taskhelp/startup_shutdown/UseStartupAndShutdownClasses.html.
Note that Weblogic is not shipped with GridGain. If you don’t have Weblogic, you need to download it separately. See http://www.bea.com more information.
8.2.6. WebSphere Loader
Websphere loader located in org.gridgain.grid.loaders.websphere package.
Websphere loader is used to start GridGain within Websphere application server. This is GridGain loader implemented as Websphere custom service (MBean). Websphere loader should is used to provide tight integration between GridGain and Websphere AS. Specifically, Websphere loader integrates GridGain with Websphere logging, MBean server and work manager (JSR-237).
The following steps should be taken to configure this loader:
-
Add CustomService in admin console (Application Servers → server1 → Custom Services → New).
-
Add custom property for this service: cfgFilePath=config/default-spring.xml.
-
Add the following parameters:
-
Classname: org.gridgain.grid.loaders.websphere.GridWebsphereLoader
-
Display Name: GridGain
-
Classpath (replace $GRIDGAIN_HOME with absolute path): "$GRIDGAIN_HOME/gridgain.jar:$GRIDGAIN_HOME/libs/". Note that forward slash (/) at the end is critical.
-
For more information consult http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/trun_customservice.html
Note that Websphere is not shipped with GridGain. If you don’t have Websphere, you need to download it separately. See http://www.ibm.com/software/websphere/ for more information.
8.2.7. Servlet context listener loader
Servlet context listener loader located in org.gridgain.grid.loaders.servlet package.
This loader can be used to startup GridGain grid inside any web container as servlet context listener. Loader must be defined in web.xml file.
<context-param>
<param-name>cfgFilePath</param-name>
<param-value>config/default-spring.xml</param-value>
</context-param>
<listener>
<listener-class>org.gridgain.grid.loaders.servlet.GridServletContextListenerLoader</listener-class>
</listener>
Servlet-based loader may be used in any web container like Tomcat, Jetty and etc. Depending on the way this loader is deployed the GridGain instance can be accessed by either all web applications or by only one. See web container class loading architecture:
|
|
To start GridGain in a web container, you have to create WAR file with the following structure: gridgain.war
|-- WEB-INF/
|-- lib/
| |-- gridgain.jar
| `-- GridGain libraries (contents of $GRIDGAIN_HOME/libs folder)
`-- web.xml (shipped with GridGain in $GRIDGAIN_HOME/config/servlet folder)
This file should be copied to deployments directory. |
8.3. Life Cycle Beans
GridLifecycleBeandoc reacts to grid lifecycle events defined in GridLifecycleEventTypedoc. Use this bean whenever you need to plug some custom logic before or after grid startup and stopping routines.
There are four events you can react to:
GridLifecycleEventType.BEFORE_GRID_START |
Invoked before grid startup routine is initiated. Note that grid is not available during this event, therefore if you injected a grid instance via GridInstanceResourcedoc annotation, you cannot use it yet. |
GridLifecycleEventType.AFTER_GRID_START |
Invoked right after grid has started. At this point, if you injected a grid instance via GridInstanceResourcedoc annotation, you can start using it. |
GridLifecycleEventType.BEFORE_GRID_STOP |
Invoked right before grid stop routine is initiated. Grid is still available at this stage, so if you injected a grid instance via GridInstanceResourcedoc annotation, you can use it. |
GridLifecycleEventType.AFTER_GRID_STOP |
Invoked right after grid has stopped. Note that grid is not available during this event. |
8.3.1. Resource Injection
Lifecycle beans can be injected using IoC (dependency injection) with grid resources. Both, field and method based injection are supported. The following grid resources can be injected:
8.3.2. Usage
If you need to tie your application logic into GridGain lifecycle, you can configure lifecycle beans via standard grid configuration, add your application library dependencies into GRIDGAIN_HOME/libs/ext folder, and simply start GRIDGAIN_HOME/ggstart.(sh|bat) scripts.
8.3.3. Configuration
Grid lifecycle beans can be configured programmatically as follows:
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
cfg.setLifecycleBeans(new FooBarLifecycleBean1(), new FooBarLifecycleBean2());
GridFactory.start(cfg);
or from Spring XML configuration file as follows:
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
...
<property name="lifecycleBeans">
<list>
<bean class="foo.bar.FooBarLifecycleBean1"/>
<bean class="foo.bar.FooBarLifecycleBean2"/>
</list>
</property>
...
</bean>
8.4. Metadata & Meta Programming
TODO
8.5. Marshaling
8.6. Messaging
Messaging - an exchange of the messages between grid nodes - is one of the main functional areas that often used standalone in GridGain (without using main Compute and Data Grid capabilities). Given GridGain’s sophisticated topology management and auto-discovery it just makes sense for many applications to simply piggy-back on this functionality and use GridGain as an intelligent message bus.
|
|
Intelligent Message Bus GridGain messaging support provides unique features that makes it an advanced message bus:
|
8.7. Events
TODO
8.8. Grid-Enabled Executor Service
Grid.executor()doc method creates ExecutorService which will execute all submitted Callable and Runnable tasks on the grid. User may run Callable and Runnable tasks just like normally with java.util.ExecutorService, but these tasks must implement Serializable interface.
The execution will happen either locally or remotely, depending on configuration of Load Balancing SPI and Topology SPI. Distributed ExecutorService delegates commands execution to already started Grid instance. Every submitted task will be serialized and transfered to any node in grid.
Here is an example of an ExecutorService to show how it can be used.
public static void main(String[] args) throws GridException {
GridFactory.start();
try {
Grid grid = GridFactory.grid();
ExecutorService srvc = grid.executor();
List<Callable<String>> cmds = new ArrayList<Callable<String>>(2);
String testVal1 = "test-value-1";
String testVal2 = "test-value-2";
cmds.add(new FooCallable<String>(testVal1));
cmds.add(new FooCallable<String>(testVal2));
List<Future<String>> futures = srvc.invokeAll(cmds);
// Wait for command completion.
String res1 = futures.get(0).get();
String res2 = futures.get(1).get();
// Print out results.
System.out.println("Results [res1=" + res1 + ", res2=" + res2 + ']');
}
finally {
GridFactory.stop(true);
}
}
where simple FooCallable is:
private static class FooCallable<T> implements Callable<T>, Serializable {
/** */
private T data = null;
/**
* @param data Some data.
*/
FooCallable(T data) {
this.data = data;
}
/**
* {@inheritDoc}
*/
public T call() throws Exception {
System.out.println("Message: " + data);
return data;
}
}
8.9. Segmenting Grid Nodes
8.9.1. Why Segment Nodes?
Often in deployments you need to segment your grid nodes into several groups, having each group perform one or more subsets of jobs only. For example, let’s say you have a scenario where you have some nodes only submitting jobs to grid (masters), and other groups of nodes only executing these jobs (workers). Then you would segment your grid into 2 groups, masters and workers, and have each group do only what it is supposed to do.
Multiple Sub-Grids
Node segmentation allows you to create multiple sub-grids within your grid. Every sub-grid may have it’s own static physical characteristics and logical responsibilities. All node characteristics, physical or logical, if they are static, can be specified in Spring configuration and used in your Topology SPI or GridTask.map(..)doc logic to implement the segmentation (this is shown in example below).
Note, that based on its attributes, every node can participate in one or multiple segments.
Dynamic Sub-Grids
You may also wish to segment your grid based on dynamic characteristics, not static. For example, what if you only want to include nodes that have less than 50% CPU utilization. In GridGain you can achieve this by using dynamic GridNodeMetricsdoc provided by GridNodedoc. All you would have to do is grab current CPU utilization from node metrics and in your GridTask.map(..)doc method only pick the nodes with CPU’s loaded under 50%.
8.9.2. Node Segmentation Example
This example shows how you can segment your grid into static segments using GridGain. In GridGain such segmentation can be easily achieved with node attributes (see GridNode.getAttribute(String)doc). Let’s say that you want to segment your grid into 3 segments: master, worker1, and worker2.
Every node at startup should get a certain number of attributes assigned to it. Here is how this can be done from Spring XML configuration:
<bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter" scope="singleton">
...
<property name="userAttributes">
<map>
<!--
In our example, segment value can be either
'master', 'worker1', or 'worker2'.
-->
<entry key="segment" value="worker1"/>
</map>
</property>
...
</bean>
Then you can restrict the topology passed into GridTask.map(..)doc method by properly configuring GridNodeFilterTopologySpi to only include nodes from segments worker1 and worker2 and always exclude nodes belonging to master segment. Here is an example:
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="topologySpi">
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="filter">
<bean class="org.gridgain.grid.lang.GridJexlPredicate2">
<constructor-arg index="0">
<value>
<![CDATA[
node.attributes().get('segment') == 'worker1' ||
node.attributes().get('segment') == 'worker2'
]]>
</value>
</constructor-arg>
<constructor-arg index="1" value="node"/>
</bean>
</property>
</bean>
</property>
...
</bean>
Alternatively, you can also implement your GridTask.map(..)doc method to map your jobs only to worker node segments. You can check which node segment a node belongs to by checking its attributes via GridNode.getAttribute(String)doc method. Here is an example:
public class FooBarGridTask extends GridTaskAdapter<String, String> {
...
public Map<GridJob, GridNode> map(List<GridNode> topology, String arg) {
Map<GridJob, GridNode> jobs = new HashMap<GridJob, GridNode>(topology.size());
for (GridNode node : topology) {
String segment = node.attribute("segment");
if (segment != null) {
if (segment.equals("worker1"))
// This type of job should only execute on 'worker1' segment.
jobs.put(new FooBarWorker1Job(arg), node);
else if (segment.equals("worker2"))
// This type of job should only execute on 'worker2' segment.
jobs.put(new FooBarWorker2Job(arg), node);
}
else
throw new GridException("Node does not belong to any segment.");
}
return jobs;
}
...
}
8.9.3. Grid Node Filters
You are able to filter nodes by providing your implementation of GridPredicate<? super GridRichNode>doc interface. Instances of classes that implement this interface are used to filter grid nodes. These instances are used to filter nodes in method GridProjection.nodes(GridPredicate<? super GridRichNode>…)doc. They are also used by GridNodeFilterTopologySpi to provide task topology based on user-defined node filters.
GridGain also comes with GridJexlPredicatedoc implementation which allows you to conveniently filter nodes based on Apache JEXL expression language. For information about specifics of JEXL expression language refer to Apache JEXL documentation.
Together with GridNodeFilterTopologySpi, GridJexlPredicate2doc allows for a fairly simple way to provide complex SLA-based task topology specifications. For example, expression below shows how the SPI can be configured with GridJexlPredicate2doc to include all Windows XP nodes with more than one processor or core and that are not loaded over 50%.
GridNodeFilterTopologySpi topSpi = new GridNodeFilterTopologySpi();
GridJexlPredicate2<GridRichNode> filter = new GridJexlPredicate2<GridRichNode>(
"node.metrics().availableProcessors > 1 && " +
"node.metrics().averageCpuLoad < 0.5 && " +
"node.attributes().get('os.name') == 'Windows XP'",
"node");
// Add filter.
topSpi.setFilter(filter);
GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// Override topology SPI.
cfg.setTopologySpi(topSpi);
// Start grid.
GridFactory.start(cfg);
or from Spring configuration file:
<bean id="grid.custom.cfg" class="org.gridgain.grid.GridConfigurationAdapter" singleton="true">
...
<property name="topologySpi">
<bean class="org.gridgain.grid.spi.topology.nodefilter.GridNodeFilterTopologySpi">
<property name="filter">
<bean class="org.gridgain.grid.lang.GridJexlPredicate2">
<constructor-arg index="0">
<value>
<![CDATA[
node.metrics().availableProcessors > 1 &&
node.metrics().averageCpuLoad < 0.5 &&
node.attributes().get('os.name') == 'Windows XP'
]]>
</value>
</constructor-arg>
<constructor-arg index="1" value="node"/>
</bean>
</property>
</bean>
</property>
...
</bean>
8.9.4. GridProjection-based Segmentation
GridGain 3.0 introduced another way to segment topology by using GridProjection. Described later in more details GridProjection represents dynamic view on global topology filtered by a predicate. GridProjection is also a monad providing monadic set of operations for any arbitrary set of nodes in the projection.
8.10. Resources Injection
Resource is a GridGain internal object or user defined one (either by Spring or set up manually) that is relevant to the context like task session for the task and job, current node id or grid instance. There is fixed numbers of GridGain internal resources that can be injected into the job, task or SPIs. Prior to their initialization and availability all resources that have corresponding annotations will be injected into the task/job/SPI. Both, field and method based injection are supported. The following grid resources can be injected:
-
Executor Service Resource
-
GridGain Home Path Resource
-
Grid Instance Resource
-
Job Id Resource
-
Job Context Resource
-
Load Balancer Resource
-
Local Node Id Resource
-
Logger Resource
-
Marshaller Resource
-
MBean Server Resource
-
Spring Application Context Resource
-
Spring Bean Resource
-
Task Session Resource
-
User Resource
8.10.1. Executor Service Resource
Executor service is a Java java.util.concurrent.ExecutorService that GridGain uses to pool threads and execute jobs, process incoming messages and P2P requests. It can be injected through the @GridExecutorServiceResourcedoc annotation. It can be injected into the tasks, jobs and SPIs. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridExecutorServiceResource
private ExecutorService execSvc;
...
}
or
public class MyGridJob implements GridJob {
...
private ExecutorService execSvc = null;
...
@GridExecutorServiceResource
public void setExecutor(GridExecutorService execSvc) {
this.execSvc = execSvc;
}
...
}
8.10.2. GridGain Home Path Resource
GridGain home path is a Java String that points to the installation directory (gives user value of environment variable named GRIDGAIN_HOME or Java system property with the same name). One should use @GridHomeResourcedoc annotation to inject this resource. It can be injected into the tasks, jobs and SPIs. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridHomeResource
private String home;
...
}
or
public class MyGridJob implements GridJob {
...
private String home = null;
...
@GridHomeResource
public void setGridGainHome(String home) {
this.home = home;
}
...
}
8.10.3. Grid Instance Resource
Grid instance is a Griddoc for executing and deploying tasks, sending messages, etc… Prior to task/job execution resources will be injected into it and one could use grid to obtain local node or remote nodes for example. Use @GridInstanceResourcedoc annotation to inject Grid resource. It can be injected into grid tasks and grid jobs. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridInstanceResource
private Grid grid;
...
}
or
public class MyGridJob implements GridJob {
...
private Grid grid = null;
...
@GridInstanceResource
public void setGrid(Grid grid) {
this.grid = grid;
}
...
}
8.10.4. Job Context Resource
Context attached to every job executed on the grid. Note that unlike GridTaskSessiondoc, which distributes all attributes to all jobs in the task including the task itself, job context attributes belong to a particular job only and do not get sent over network unless a job moves from one node to another. This resource can only be injected into Grid Jobs and not Grid Tasks or SPIs. @GridJobContextResourcedoc annotation can be used to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridJobContextResource
private GridJobContext jobCtx;
...
}
or
public class MyGridJob implements GridJob {
...
private GridJobContext jobCtx = null;
...
@GridJobContextResource
public void setJobContext(GridJobContext jobCtx) {
this.jobCtx = jobCtx;
}
...
}
8.10.5. Load Balancer Resource
Load balancer can be injected into grid tasks only. Specific implementation for grid load balancer is defined by GridLoadBalancingSpidoc which is provided to grid via GridConfigurationdoc. @GridLoadBalancerResourcedoc annotation can be used to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridLoadBalancerResource
private GridLoadBalancer balancer = null;
...
}
or
public class MyGridJob implements GridJob {
...
private GridLoadBalancer balancer = null;
...
@GridLoadBalancerResource
public void setLoadBalancer(GridLoadBalancer balancer) {
this.balancer = balancer;
}
...
}
8.10.6. Local Node Id Resource
This resource injects local node ID of type java.util.UUID into an instance of Grid Job, task or SPI. Node that is is the ID of a node your code is executed on which is not necessarily ID of a node that started the execution. @GridLocalNodeIdResourcedoc annotation can be used to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridLocalNodeIdResource
private UUID locNodeId = null;
...
}
or
public class MyGridJob implements GridJob {
...
private UUID locId = null;
...
@GridLocalNodeIdResource
public void setLocNodeId(UUID locNodeId) {
this.locNodeId = locNodeId;
}
...
}
8.10.7. Logger Resource
Grid logger is provided to grid via GridConfigurationdoc. It can be injected into grid tasks, grid jobs, and SPI’s. Use @GridLoggerResourcedoc annotation to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridLoggerResource
private GridLogger log = null;
...
}
or
public class MyGridJob implements GridJob {
...
private GridLogger log = null;
...
@GridLoggerResource
public void setLogger(GridLogger log) {
this.log = log;
}
...
}
8.10.8. Marshaller Resource
Injects marshaller to the task/job/SPI. Marshaller allows to serialize/deserialize data the same way as Grid does. Marshaller can be configured via GridConfigurationdoc. Use @GridMarshallerResourcedoc annotation to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridMarshallerResource
private GridMarshaller marshaller = null;
...
}
or
public class MyGridJob implements GridJob {
...
private GridMarshaller marshaller = null;
...
@GridMarshallerResource
public void setMarshaller(GridMarshaller marshaller) {
this.marshaller = marshaller;
}
...
}
8.10.9. MBean Server Resource
Injects MBean server into the task, job or SPI. MBean server is the same as Grid uses to register its own MBeans. Use @GridMBeanServerResourcedoc annotation to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridMBeanServerResource
private MBeanServer mbeanSrv = null;
...
}
or
public class MyGridJob implements GridJob {
...
private MBeanServer mbeanSrv = null;
...
@GridMBeanServerResource
public void setMBeanServer(MBeanServer srv) {
this.srv = srv;
}
...
}
8.10.10. Spring Application Context Resource
Injects Spring ApplicationContext resource. When GridGain starts using Spring configuration, the Application Context for Spring Configuration is either created or passed to the startup routine. It can be injected into grid tasks, grid jobs, and SPI’s. @GridSpringApplicationContextResourcedoc annotation is to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridSpringApplicationContextResource
private ApplicationContext springCtx = null;
...
}
or
public class MyGridJob implements GridJob {
...
private ApplicationContext springCtx = null;
...
@GridSpringApplicationContextResource
public void setApplicationContext(MBeanServer springCtx) {
this.springCtx = springCtx;
}
...
}
8.10.11. Spring Bean Resource
Injects any custom resources declared in provided Spring ApplicationContext. It can be injected into grid tasks and grid jobs. Use it when you would like, for example, to inject something like JDBC connection pool into tasks or jobs - this way your connection pool will be instantiated only once per task and reused for all executions of this task. You can inject other resources into your user resource. User resources may contain fields or setters with other resource annotations . The resource will be picked up from provided Spring ApplicationContext by name value. Note, that injected spring bean must be declared in Spring ApplicationContext on every grid node where they get accessed. Use @GridSpringResourcedoc annotation to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridSpringResource(resourceName = "bean-name")
private transient MyUserBean rsrc = null;
...
}
or
public class MyGridJob implements GridJob {
...
private transient MyUserBean rsrc = null;
...
@GridSpringResource(resourceName = "bean-name")
public void setMyUserBean(MyUserBean rsrc) {
this.rsrc = rsrc;
}
...
}
on Spring side it would look like following:
<bean id="bean-name" class="my.foo.MyUserBean" singleton="true">
...
</bean>
8.10.12. Task Session Resource
Injects Distributed Grid Task Session into the job or task, but not in SPI. Task session gives a simple way to set task/job attributes. Use @GridTaskSessionResourcedoc annotation to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridTaskSessionResource
private GridTaskSession taskSes = null;
...
}
or
public class MyGridJob implements GridJob {
...
private GridTaskSession taskSes = null;
...
@GridTaskSessionResource
public void setTaskSession(GridTaskSession taskSes) {
this.taskSes = taskSes;
}
...
}
8.10.13. User Resource
@GridUserResourcedoc injects any custom user resource into grid tasks or grid jobs. Use it when you would like, for example, to use something like JDBC connection pool from your tasks or jobs - this way your connection pool will be instantiated only once per task and reused for all executions of this task.
The resource will be created based on the resourceClass value. If resourceClass is not specified, then the field type or setter parameter type will be used to infer the class type of the resource. Set resourceClass to a specific value if the class of resource cannot be inferred from field or setter declaration (for example, if field is an interface).
User resource will be instantiated once on every node where task is deployed. Basically there will always be only one instance of resource on any grid node for any given task class. Every node will instantiate it’s own copy of user resources used for every deployed task (see GridUserResourceOnDeployeddoc and GridUserResourceOnUndeployeddoc annotation for resource deployment and undeployment callbacks).
|
|
User resources are never serialized (they get instantiated) and should always be declared as transient. |
|
|
The scope of user resource depends on Deployment Mode used. You can configure your user resources to be deployed on per-task, per-class-loader, or per-grid basis. Take a look at Deployment Mode documentation for more information. |
Use @GridUserResourcedoc annotation to inject this resource. Here is how injection would typically happen:
public class MyGridJob implements GridJob {
...
@GridUserResource
private transient MyUserResource rsrc = null;
...
}
or
public class MyGridJob implements GridJob {
...
private transient MyUserResource rsrc = null;
...
@GridUserResource
public void setMyUserResource(MyUserResource rsrc) {
this.rsrc = rsrc;
}
...
}
where resource class can look like this:
public class MyUserResource {
...
// Inject logger (or any other resource).
@GridLoggerResource
private GridLogger log = null;
// Inject grid instance (or any other resource).
@GridInstanceResource
private Grid grid = null;
// Deployment callback.
@GridUserResourceOnDeployed
public void deploy() {
// Some initialization logic.
...
}
// Undeployment callback.
@GridUserResourceOnUndeployed
public void undeploy() {
// Some clean up logic.
...
}
}
9. Deployment
Prior to being used, a Grid Task needs to be deployed:
-
If peer class loading is enabled (see property GridConfiguration.isPeerClassLoadingEnabled() in Grid Configuration):
-
Task class loaded from local class path if it is not defined as Local P2P Exclude
-
If there is no task class in local class path or task class needs to be peer loaded it is downloaded from task originating node.
-
-
If peer class loading is disabled:
-
Check that task class was deployed. If you are using GAR Deployment, then your task will be implicitly deployed every time GAR file or directory is changed. Otherwise, task can be deployed explicitly in code via Grid.deployTask(Class<? extends GridTask<?>>)doc method.
-
If task class was not deployed then we try to find it in local class path by task name. If you are not using @GridTaskNamedoc annotation to provide a custom task name, then your task name will default to the actual class name of the task and the task will be auto-deployed first time it’s executed (no explicit deployment step is required in this case).
-
If task has custom name (that does not correspond task class name), and this task was not deployed before, then exception will be thrown.
-
9.1. Peer Class Loading
Peer class loading (P2P) is turned on by default. To turn it off set GridConfiguration.isPeerClassLoadingEnabled()doc property to false in Grid Configuration.
Although internals of peer class loading are rather complex, what it means in a nutshell is that when a JVM on the remote node needs to find a certain class as part of the grid task execution, it will check the local class loader first and if such class cannot be found, it will ask the node that originated grid task execution (one that should have this class by design) to provide it. In an essence, GridGain class loading becomes grid-aware.
This technique is invaluable during grid application development. It allows for absolutely grid-transparent development cycle: you write your application as you normally do (in a single node environment of Eclipse, IDEA, etc.), compile and run it - and it seamlessly runs on the grid without any extra deployment or build steps what-so-ever. More over, GridGain supports hot-redeployment so you don’t have to restart GridGain every time you change the grid task code - again, just modify the code, compile and run and all your changes will be picked up on the grid.
Peer class loading sequence works as follows:
-
GridGain will check if class was loaded at system startup, and if it was, it will be returned. No class loading from a peer node will take place in this case.
-
If class is not locally loaded, then a request will be sent to task originating node to provide class definition. Originating node will send class byte code definition and the class will be loaded on a peer node.
Peer class loading should be used in most situations, especially during development with Java IDEs. It allows to dramatically reduce overhead of grid-enabled application development effectively making it as quick and productive as local application development. You simply change code and run - and your modified application seamlessly runs on the grid.
|
|
When utilizing peer class loading, you should be aware of the libraries that get loaded from peer nodes vs. libraries that are already available locally in the class path. Our suggestion is to include all 3rd party libraries into class path of every node. This way you will not transfer megabytes of 3rd party classes to remote nodes every time you change a line of code. |
|
|
Error Messages Some frameworks like Spring or CGLib ask for the certain classes to identify whether another frameworks are
available or not. For example Spring being started looks for the Groovy framework and thus when peer-to-peer
feature is on this class/resource request might be sent to remote node. If there is no such class/resource
available then you may get message "Requested resource not found" on remote (task originating node) and
"Failed to get resource due to remote failure" on local node. They are printed out for your information only. |
9.1.1. Local P2P Exclude
Note that giving preference to local deployment (as GridGain does by default) does not always work. For example, GridGain utilizes Spring for its own implementation, so Spring is always loaded locally by system class loader at startup. This may create a problem if user also utilizes Spring to load some beans reflectively. For example, Spring Hibernate support will attempt to load Hibernate classes with its own class loader (system class loader in this case) and if Hibernate is not in local class path, class definitions will not be found.
There are 2 ways to solve this problem:
-
Include Hibernate jars into class path on every node. This will perform better, as Hibernate jars will not have to be loaded with every task deployment.
-
If above does not work, you can make sure that Spring and Hibernate classes will always be loaded from a peer node by specifying their packages in GridConfiguration.getP2PLocalClassPathExclude()doc configuration property in Grid Configuration.
9.2. Deployment Modes
Deployment mode is specified at grid startup via GridConfiguration.getDeploymentMode()doc configuration property (it can also be specified in Spring XML configuration file). The main difference between all deployment modes is how classes and user resources are loaded on remote nodes via peer-class-loading mechanism. User resources can be instances of caches, databased connections, or any other class specified by user with @GridUserResourcedoc annotation.
The following deployment modes are supported:
| Mode | Description |
|---|---|
GridDeploymentMode.PRIVATEdoc |
In this mode deployed classes do not share user resources (see @GridUserResourcedoc). Basically, user resources are created once per deployed task class and then get reused for all executions. Note that classes deployed within the same class loader on master node, will still share the same class loader remotely on worker nodes. However, tasks deployed from different master nodes will not share the same class loader on worker nodes, which is useful in development when different developers can be working on different versions of the same classes. Also note that resources are associated with task deployment, not task execution. If the same deployed task gets executed multiple times, then it will keep reusing the same user resources every time. |
GridDeploymentMode.ISOLATEDdoc |
Unlike PRIVATE mode, where different deployed tasks will never use the same instance of user resources, in ISOLATED mode, tasks or classes deployed within the same class loader will share the same instances of user resources (see @GridUserResourcedoc). This means that if multiple tasks classes are loaded by the same class loader on master node, then they will share instances of user resources on worker nodes. In other words, user resources get initialized once per class loader and then get reused for all consequent executions. Note that classes deployed within the same class loader on master node, will still share the same class loader remotely on worker nodes. However, tasks deployed from different master nodes will not share the same class loader on worker nodes, which is especially useful when different developers can be working on different versions of the same classes. |
GridDeploymentMode.SHAREDdoc |
Same as GridDeploymentMode.ISOLATED, but now tasks from different master nodes with the same user version and same class loader will share the same class loader on remote nodes. Classes will be undeployed whenever all master nodes leave grid or user version changes. The advantage of this approach is that it allows tasks coming from different master nodes share the same instances of user resources (see @GridUserResourcedoc) on worker nodes. This allows for all tasks executing on remote nodes to reuse, for example, the same instances of connection pools or caches. When using this mode, you can startup multiple stand-alone GridGain worker nodes, define user resources on master nodes and have them initialize once on worker nodes regardless of which master node they came from. This method is specifically useful in production as, in comparison to GridDeploymentMode.ISOLATED deployment mode, which has a scope of single class loader on a single master node, GridDeploymentMode.SHARED mode broadens the deployment scope to all master nodes. Note that classes deployed in GridDeploymentMode.SHARED mode will be undeployed if all master nodes left grid or if user version changed. User version can be specified in META-INF/gridgain.xml file as a Spring bean property with name userVersion. This file has to be in the class path of the class used for task execution. SHARED deployment mode is default mode used by the grid. |
GridDeploymentMode.CONTINUOUSdoc |
Same as SHARED deployment mode, but user resources (see @GridUserResourcedoc) will not be undeployed even after all master nodes left grid. Tasks from different master nodes with the same user version and same class loader will share the same class loader on remote worker nodes. Classes will be undeployed whenever user version changes. The advantage of this approach is that it allows tasks coming for different master nodes share the same instances of user resources (see @GridUserResourcedoc) on worker nodes. This allows for all tasks executing on remote nodes to reuse, for example, the same instances of connection pools or caches. When using this mode, you can startup multiple stand-alone GridGain worker nodes, define user resources on master nodes and have them initialize once on worker nodes regardless of which master node they came from. This method is specifically useful in production as, in comparison to ISOLATED deployment mode, which has a scope of single class loader on a single master node, CONTINUOUS mode broadens the deployment scope to all master nodes. Note that classes deployed in CONTINUOUS mode will be undeployed only if user version changes. User version can be specified in META-INF/gridgain.xml file as a Spring bean property with name userVersion. This file has to be in the class path of the class used for task execution. |
9.2.1. User Version
User version comes into play whenever you would like to redeploy tasks deployed in SHARED or CONTINUOUS modes. By default, GridGain will automatically detect if class-loader changed or a node is restarted. However, if you would like to change and redeploy code on a subset of nodes, or in case of CONTINUOUS mode to kill the ever living deployment, you should change the user version.
User version is specified in META-INF/gridgain.xml file as follows:
<!-- User version. -->
<bean id="userVersion" class="java.lang.String">
<constructor-arg value="0"/>
</bean>
By default, all gridgain startup scripts (ggstart.sh or ggstart.bat) pick up user version from GRIDGAIN_HOME/config/userversion folder. Usually, it is just enough to update user version under that folder, however, in case of GAR or JAR deployment, you should remember to provide META-INF/gridgain.xml file with desired user version in it.
9.2.2. Always-Local Development
GridGain deployment (regardless of mode) allows you to develop everything as you would locally. You never need to specifically write any kind of code for remote nodes. For example, if you need to use a distributed cache from your GridJobdoc, then you can the following:
-
Simply startup stand-alone GridGain nodes by executing GRIDGAIN_HOME/ggstart.{sh|bat} scripts.
-
Inject your cache instance into your jobs via @GridUserResourcedoc annotation. The cache can be initialized and destroyed with @GridUserResourceOnDeployeddoc and @GridUserResourceOnUndeployeddoc annotations.
-
Now, all jobs executing locally or remotely can have a single instance of cache on every node, and all jobs can access instances stored by any other job without any need for explicit deployment.
9.3. JEE Deployment
When deploying grid tasks into JEE container, you can keep using standard JEE deployment artifacts. For example, if you are deploying a WAR file into JEE container, simply add your grid task classes to the WAR file and that’s it.
9.4. GAR Deployment
GAR deployment is a traditional deployment model, similar to JAR/WAR/EAR deployment in JEE, where you create the *G*rid *AR*chive file that contains all necessary classes for the grid task and deploy it. GridGain comes with URL-based GridDeploymentSpidoc implementation so that you can deploy your GAR files on any URLs accessible via FTP, HTTP(S), POP3 or FILE protocols.
For example, when properly configured, you can just drop your GARs into certain folder on your web server and they will be deployed on the grid.
9.4.1. GAR File
GAR file is a deployable unit used by GridUriDeploymentSpidoc. GAR file is based on ZLIB compression format like simple JAR file and its structure is similar to WAR archive. GAR file has .gar extension.
GAR file structure (file or directory ending with .gar):
META-INF/
|
- gridgain.xml
- ...
lib/
|
-some-lib.jar
- ...
xyz.class
...
-
META-INF entry may contain gridgain.xml file which is a task descriptor XML file. The purpose of task descriptor XML file is to specify all tasks to be deployed. This file is a regular Spring XML definition file. META-INF entry may also contain any other file specified by JAR format.
-
lib entry contains all library dependencies.
-
Compiled Java classes must be placed in the root of a GAR file.
GAR file may be deployed without descriptor file. If there is no descriptor file, GridDeploymentSpidoc will scan all classes in archive and instantiate those that implement GridTaskdoc interface. In that case, all grid task classes must have a public no-argument constructor (you can always use GridTaskAdapterdoc adapter for convenience when creating grid tasks).
|
|
gridgain.xml GAR Descriptor gridgain.xml is optional. If not provided -
GridDeploymentSpidoc will scan all classes in GAR archive. |
By default, all downloaded GAR files that have digital signature in META-INF folder will be verified and deployed only if signature is valid.
gridgain.xml
gridgain.xml GAR descriptor file is a standard Spring XML that should contain zero or more java.util.List beans. Each list should contain fully qualified class names for grid tasks. Here’s an example of gridgain.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Spring configuration file for test classes in gar-file.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.0.xsd">
<description>Gridgain Spring configuration file in gar-file.</description>
<!--
Test tasks specification.
-->
<util:list id="tasks">
<value>foo.bar.SomeGridTask1</value>
<value>foo.bar.SomeGridTask2</value>
</util:list>
</beans>
9.4.2. Ant GAR Task
GridGain is shipped with GAR Ant task: GridGarAntTaskdoc. This task extends zip Ant task and can be used exactly like standard jar Ant task. GAR Ant task allows to archive class files and necessary dependencies (like resource files and libraries) with optional descriptor file (gridgain.xml). GridGain comes with an example of using GAr deployment including build.xml. Here’s an example of how to use GAR Ant task in typical build.xml:
<!--
Special task for creating GAR files.
-->
<taskdef name="gar" classname="org.gridgain.grid.tools.ant.gar.GridGarAntTask"
classpathref="gg.libs.path"/>
<!-- Create GAR file. -->
<gar destfile="${examples.gar.deploy.dir}/${gar.name}"
descrdir="${examples.gar.dir}/META-INF"
basedir="${examples.gar.deploy.dir}/tmpgar"/>
10. Zero Deployment
TODO
11. Compute Grid
11.1. MapReduce
11.1.1. Overview
MapReduce is a distributed computing paradigm which allows to map your task into smaller jobs based on some key, execute these jobs on Grid nodes, and reduce multiple job results into one task result.
Here is a diagram that explains how MapReduce works based on Shape Counter example. Given a collection of Shapes we split this collection into 2 parts and send every part to a grid node. Each node will count number of Shapes provided and will return it back to caller. The caller then will add results received from remote nodes and provide the reduced result back to the user (the counts are displayed next to every shape).
In GridGain, MapReduce paradigm is implemented via GridTaskdoc interface.
Map Operation
Result Operation
Upon completion of any job, GridTask.result(..) method is invoked which is responsible to tell GridGain whether to Wait for more job results, Reduce now, or Failover this job to another node.
Reduce Operation
This operation is responsible for taking multiple results from remote jobs and reducing them into one aggregate result. This aggregated result will be returned to the user.
11.1.2. Pull vs. Push MapReduce
One of the fundamental differences between GridGain’s implementation of MapReduce and the ones in the existing or legacy systems like Sun GridEngine, GigaSpaces, Hadoop and Globus is the cardinality or the type of the mapping operation. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The latter has fundamental advantage that was largely missing in grid computing frameworks before GridGain.
|
|
GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved. |
This unique property of GridGain’s MapReduce implementation has profound effect on ability to develop grid applications with the advanced load balancing, failover and collision resolution logic.
See Early And Late Load Balancing for more information.
11.2. GridTask and GridJob
11.2.1. GridTask And GridJob Interfaces
To create a grid task you need to implement GridTaskdoc interface. When implementing this interface you will also need to be aware of GridJobdoc interface. Basically, both of these interfaces define practically everything you need to know to create a grid task. In a nutshell, GridTask is responsible for splitting business logic into multiple grid jobs, receiving results from individual grid jobs executing on remote nodes, and reducing (aggregating) received jobs' results into final grid task result.
Grid task gets split into jobs when GridTask.map(List, Object)doc method is called. This method returns all jobs for the task mapped to their corresponding grid nodes for execution. Grid will then serialize this jobs and send them to requested nodes for execution.
11.2.2. Executing Grid Tasks
Grid-enabling is a process of making a piece of Java code to execute on the grid. In GridGain, there are two ways to do grid-enabling: API-based and annotation-based.
|
|
Direct Execution vs. Annotation-Based AOP There is no better or worse between these two methods. They both have their areas of applicability.
When creating grid task you basically have the same programming and development model as in JEE: you
create a component, deploy it and execute it. With annotation-based grid-enabling you have an extra option
of transparently attaching grid-enabling logic to existing code without modifying it (except for
additional annotation). |
API-Based Grid Task Execution
This method allows to grid-enable any arbitrary Java code. You have a full control on split and aggregate logic and all other aspects of grid task execution. Here is an example of direct grid task execution:
public static void main(String[] args) throws GridException {
GridFactory.start();
try {
Grid grid = GridFactory.getGrid();
// Execute task.
GridTaskFuture<String> future = grid.execute(FooBarTask.class, "Argument");
// Wait for task completion.
String result = fugure.get();
// Print out task result.
System.out.println("Task result: " + result);
}
finally {
GridFactory.stop(true);
}
}
Annotate Existing Method With Gridify Annotation
The only difference of this method vs. directly executing grid task is that you can annotate a regular Java method and it will become grid-enabled. Using this technique you can still have custom grid task that will handle annotation-based grid-enabling (including split & aggregate logic or passing state to remote jobs) but you will be limited to the boundaries of the method you are grid-enabling. Here is an example of such usage:
@Gridify(taskClass = FooBarTask.class, timeout = 3000)
public void sayIt(String arg) {
// Some business logic.
}
For information on how to configure AOP, refer to AOP Configuration section.
|
|
Serializable State Note that when using @Gridifydoc annotation on non-static methods
without specifying explicit grid task, the state of the whole instance will be serialized and sent
out to remote node. Therefore the class must implement java.io.Serializable interface. If you cannot
make the class Serializable, then you must implement custom grid task which will take care of proper
state initialization. In either case, GridGain must be able to serialize the state passed to remote node. |
11.2.3. Configuring Grid Tasks
Starting with GridGain 2.1 you can start multiple instances of Topology SPI, Load Balancing SPI, Failover SPI and Checkpoint SPI. If you do that, you need to tell a task which SPI to use (by default it will use the fist SPI in the list).
Add @GridTaskSpisdoc annotation to your task to specify which SPIs it wants to use. If this annotation is omitted, then by default GridGain will pick the first corresponding SPI implementation from the array provided in configuration.
For more information and examples refer to Specifying Different SPIs Per GridTask documentation.
11.2.4. Grid Task Execution Sequence
The sequence of task execution can be described as following:
-
Upon request to execute a grid task with given task name system will find deployed task with given name.
-
System will create new Distributed Grid Task Session. Also see GridTaskSessiondoc.
-
System will inject all annotated resources (including Distributed Grid Task Session) into grid task instance. See Resources Injection for more information.
-
System will call method map(…) on GridTaskdoc interface. These method is basically responsible for splitting business logic of grid task into multiple grid jobs (units of execution) and mapping them to grid nodes. Method map(…) returns a map of grid jobs keyed by the grid nodes. Consider using @GridLoadBalancerResourcedoc to inject load balancer into task for assigning jobs to the best available nodes.
-
System will start sending grid jobs to their respective nodes.
-
Upon arrival to remote node, grid job gets put on waiting list which is passed to underlying GridCollisionSpidoc SPI.
-
The Collision SPI on remote node will decide one of the following scheduling policies:
Policy Description WAIT
Grid Job will be kept on waiting list. In this case, job will not get a chance to execute until next time the Collision SPI is called. Collision SPI gets called every time a new job arrives or an active one completes.
EXECUTE
Grid Job will be moved to active list (i.e. activated). In this case system will proceed with job execution.
REJECT
Job on the waiting list can be rejected before they get a chance to start executing. In this case the GridJobResultdoc passed into GridTask.result(GridJobResult, List)doc method will contain GridExecutionRejectedExceptiondoc exception. If you are using any of the task adapters shipped with GridGain, then job will be failed over automatically for execution on another node.
CANCEL
If GridJob is on the active list and is currently executing, then it can be canceled by calling GridJob.cancel()doc method. Note that in this case job will still complete and return a result from GridJob.execute()doc method.
-
For activated jobs on remote nodes, system will inject all annotated resources (including Distributed Grid Task Session) into grid job instance. See Resources Injection for more information.
-
Remote nodes will execute the jobs by calling GridJob.execute()doc method.
-
If job gets canceled while executing on remote node, then GridJob.cancel()doc method will be called. Note that just like with Thread.interrupt() method, grid job cancellation serves as a hint that a job should stop executing or exhibit some other user defined behavior. Generally it is up to a job to decide whether it wants to react to cancellation or ignore it. Job cancellation can happen for several reasons:
-
Collision SPI has canceled an active job.
-
Parent task has completed without waiting for this job’s result.
-
User canceled task by calling GridTaskFuture.cancel()doc method.
-
-
Once job execution is complete, the return value will be sent back to parent task and will be passed into GridTask.result(GridJobResult, List)doc method. If job execution resulted in a checked exception, then GridJobResult.getException()doc method will contain that exception. If job execution threw a runtime exception or error, then it will be wrapped into GridUserUndeclaredExceptiondoc exception. # Method GridTask.result(GridJobResult, List)doc is called for each job result and decides whether or not to continue waiting for the remaining results, failover current result or reduce immediately based on returned policy.
Policy Description GridJobResultPolicy.WAITdoc
If this policy is returned, then Grid Task will continue to wait for other job results. If this result is the last job result, then GridTask.reduce(List)doc method will be called.
GridJobResultPolicy.REDUCEdoc
If this policy is returned, then method GridTask.reduce(List)doc will be called right away without waiting for other jobs' completion (all remaining jobs will receive a cancel request).
GridJobResultPolicy.FAILOVERdoc
If this policy is returned, then job will be failed over to another node for execution. The node to which job will get failed over to is decided by GridFailoverSpidoc SPI implementation. Note that if you use any of task adapters then they will automatically fail-over jobs to ther nodes for 2 known failure cases: node crash and job rejection.
-
When enough results are received, method GridTask.reduce(List)doc is called to aggregate (reduce) these results into one final grid task result.
-
After reduce(…) is complete - the result is returned to user as grid task result and can be retrieved from GridTaskFuture.get()doc method.
-
System will clean up all task session resources (such as checkpoints with session scope). Execution of the grid task is considered finished at this point.
11.2.5. Grid Task Coding Guidelines
There are certain known patterns and anti-patterns to be aware of when developing grid task and jobs.
Serialization and Deserialization
Jobs created by task are moved from one grid node to another. Before sending they are serialized into the byte stream and thus need to implement java.io.Serializable interface. On remote node every job is deserialized with a class loader that depends on deployment method (see Grid Deployment).
Prior to GridGain 2.1, every grid job class member (including super classes) except for static members need to implement java.io.Serializable. Static class members will not be sent to remote node and should be initialized on remote node. Note also that task parameters passed into GridJob.execute()doc method are sent to remote nodes and need to implement java.io.Serializable as well.
Starting with GridGain 2.1, you can configure different Grid Marshallers and depending on a marshaller, serialization may either be required or not.
Inner and Anonymous Classes
Any kind of inner classes or anonymous classes are allowed. Write your code as you usually do and GridGain will distribute it. You can implement your job as anonymous class within grid task class and use task class members inside your job. Here is an example of anonymous job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import java.io.*;
import java.util.*;
import org.gridgain.grid.*;
/**
* Test task with anonymous job which uses method scope variable.
*/
public class TestGridTask extends GridTaskSplitAdapter<String> {
/** Dummy multiplier. */
private int multiplier = 3;
/**
* This method is responsible for splitting a task into multiple jobs.
*/
@Override
protected Collection<? extends GridJob> split(int gridSize, final String arg) throws GridException {
List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(gridSize);
for (int i = 0; i < gridSize; i++) {
jobs.add(new GridJobAdapter<String>() {
/**
* Every job simply multiplies number of characters in the argument by some multiplier.
*/
public Serializable execute() throws GridException {
return multiplier * arg.length();
}
});
}
return jobs;
}
/**
* Reduces multiple job results into one task result.
*/
public Object reduce(List<GridJobResult> results) throws GridException {
int sum = 0;
// For the sake of this example, let's sum all results.
for (GridJobResult res : results) {
sum += (Integer)res.getData();
}
return sum;
}
}
|
Here we have anonymous job class created at line 20 which uses method-scope variable arg of task class declared in method signature at line 16 and used in job at line 25 as well as task class member multiplier declared at line 10 and used at line 25.
Overriding Methods with Gridify Annotation
If you have following code:
public class A {
@Gridify
protected methodA() {
...
}
}
public class B extends A {
@Override
protected methodA() {
...
super.methodA();
...
}
}
and use aspects you should get B.methodA() called twice, first on your local node and second time on remote node regardless of class or method modifiers. This is a feature of aspects implementation and we don’t recommend to use @Gridify in parent classes.
Here is step by step explanation:
-
You create object of class B.
-
You make a call to B.methodA() and since this method does not have annotation in class B aspects will not work.
-
Your B.methodA() executes and it calls super.methodA()
-
A.methodA() has annotation and thus aspect will call GridGain and distribute your object of class B and method call to a grid node.
-
On the grid node (local or remote) B.methodA() will be called (note that you have object of class B) again.
-
Your B.methodA() executes and it calls super.methodA()
-
Method A.methodA() has annotation but GridGain will catch this situation and it won’t be distributed twice but instead will be just called.
As you can see we have 2 executions of B.methodA() and only one A.methodA().
11.2.6. Resources Injection
GridTaskdoc and GridJobdoc implementations can be injected using IoC (dependency injection) with grid resources. Both, field and method based injection are supported.
The following grid resources can be injected:
| Resource | Description | ||
|---|---|---|---|
@GridTaskSessionResourcedoc |
Injects Distributed Grid Task Session. |
||
@GridInstanceResourcedoc |
Injects the actual instance of Griddoc this task is executed on. |
||
@GridLoggerResourcedoc |
Injects an instance of GridLoggerdoc logger used by this grid instance. |
||
@GridHomeResourcedoc |
Injects a path to GridGain installation home. |
||
@GridExecutorServiceResourcedoc |
Injects an instance of java.util.concurrent.ExecutorService used by this grid. |
||
@GridLocalNodeIdResourcedoc |
Injects local grid node ID of type java.util.UUID. |
||
@GridMBeanServerResourcedoc |
Injects an instance of javax.management.MBeanServer used by this grid node. |
||
@GridJobIdResourcedoc |
This resource can only be injected into Grid Jobs and not Grid Tasks. It injects unique job execution ID of type java.util.UUID into an instance of Grid Job. |
||
@GridSpringApplicationContextResourcedoc |
This resource injects the Spring application context into tasks and jobs. You can use it for accessing Spring beans or any other information available in Spring application context. By default, this application context is the same as the one used for configuring GridGain, but you can pass a custom one by calling GridFactory.start(GridConfiguration, ApplicationContext)doc method.
|
||
@GridUserResourcedoc |
Use this annotation to inject custom resources into tasks and jobs. The scope of this resource is per-task, so it will be initialized once the task is deployed and de-initialized once task is undeployed. Also see @GridUserResourceOnDeployeddoc and @GridUserResourceOnUndeployeddoc for controlling resource life cycle. |
||
@GridMarshallerResourcedoc |
Resource can be injected into the task, job or SPI and gives you simple way of marshalling/unmarshalling data or objects (since 2.1.0). |
||
@GridSpringBeanResourcedoc |
Injects any custom resources declared in provided Spring ApplicationContext. It can be injected into grid tasks and grid jobs. The resource will be picked up from provided Spring ApplicationContext by name value. Note, that injected spring bean must be declared in Spring ApplicationContext on every grid node where they get accessed (since 2.1.0). |
Refer to Resources Injection for more information.
11.2.7. Convenience Adapters
| Adapter | Description |
|---|---|
GridTaskAdapterdoc |
Grid Task adapter that provides default implementation for GridTask.result(GridJobResult, List)doc method which implements automatic fail-over to another node if remote job has failed due to a node crash (detected by GridTopologyExceptiondoc exception) or due to job execution rejection (detected by GridExecutionRejectedExceptiondoc exception). |
GridTaskSplitAdapterdoc |
Grid Task adapter that hides the job-to-node mapping logic from user and provides convenient GridTaskSplitAdapter.split(int, Object)doc method for splitting task into sub-jobs in homogeneous environments. |
GridJobAdapterdoc |
Grid Job adapter that provides default empty implementation for GridJob.cancel()doc method and also allows user to set and get job argument, if there is one. |
Refer to corresponding adapter documentation for more information.
11.2.8. Distributed Session Attributes And Checkpoints
Both, Grid Tasks and Grid Jobs can utilize Distributed Grid Task Session for coordination with each other via session attributes and checkpoints.
Session Attributes
Jobs can communicate with parent task and with other job siblings from the same task by setting session attributes (see GridTaskSessiondoc). Other jobs can wait for an attribute to be set either synchronously or asynchronously. Such functionality allows jobs to synchronize their execution with other jobs at any point and can be useful when other jobs within task need to be made aware of certain event or state change that occurred during job execution.
Saving Checkpoints
Long running jobs may wish to save intermediate checkpoints to protect themselves from failures. There are three checkpoint management methods available on Grid Task Session which allow user to save, load, and remove checkpoints.
Jobs that utilize checkpoint functionality should attempt to load a check point at the beginning of execution. If a non-null value is returned, then job can continue from where it failed last time, otherwise it would start from scratch. Throughout it’s execution job should periodically save its intermediate state to avoid starting from scratch in case of a failure.
Refer to Distributed Grid Task Session documentation for more information.
11.2.9. MapReduce Paradigm
The design of GridTaskdoc is heavily influenced by Google MapReduce paradigm. For more information about MapReduce paradigm, refer to MapReduce: Simplified Data Processing on Large Clusters article from Google.
11.2.10. Example
Below is a grid task implementation that is responsible for split and aggregate (a.k.a map/reduce) logic. Note that this implementation uses GridTaskSplitAdapterdoc that simplifies API for grid tasks in homogeneous grids (which is often the case). Main two methods that are implemented here are split and reduce. Method reduce aggregates result (number of characters in the string) returned from every node.
package org.gridgain.examples.helloworld.api;
import org.gridgain.grid.*;
import java.util.*;
import java.io.*;
public class GridHelloWorldTask extends GridTaskSplitAdapter<String, Integer> {
/** Auto-injected grid logger. */
@GridLoggerResource
private GridLogger log = null;
@Override
public Collection<? extends GridJob> split(int gridSize, String phrase) throws GridException {
// Split the passed in phrase into multiple words separated by spaces.
String[] words = phrase.split(" ");
List<GridJob> jobs = new ArrayList<GridJob>(words.length);
for (String word : words) {
// Every job gets its own word as an argument.
jobs.add(new GridJobAdapter<String>(word) {
/*
* Simply prints the word passed into the job and
* returns number of letters in that word.
*/
public Serializable execute() {
String word = getArgument();
if (log.isInfoEnabled() == true) {
log.info(">>>");
log.info(">>> Printing '" + word + "' on this node from grid job.");
log.info(">>>");
}
// Return number of letters in the word.
return word.length();
}
});
}
return jobs;
}
/**
* Sums up all letters from all jobs and returns a
* total number of letters in the phrase.
*
* @param results Job results.
* @return Number of letters for the phrase passed into
* <tt>split(gridSize, phrase)</tt> method above.
* @throws GridException If reduce failed.
*/
public Integer reduce(List<GridJobResult> results) throws GridException {
int totalCharCnt = 0;
for (GridJobResult res : results) {
// Every job returned a number of letters
// for the word it was responsible for.
Integer charCnt = res.getData();
totalCharCnt += charCnt;
}
// Account for spaces. For simplicity we assume one space between words.
totalCharCnt += results.size() - 1;
// Total number of characters in the phrase
// passed into task execution.
return totalCharCnt;
}
}
11.3. GridProjection
TODO
11.4. GridTaskSession
11.4.1. Overview
Distributed task session is created for every task execution. It is defined by GridTaskSessiondoc interface. Task session is distributed across the parent task and all grid jobs spawned by it, so attributes set on a task or on a job can be viewed on other jobs. Correspondingly attributes set on any of the jobs can also be viewed on a task.
Session has 2 main features: attribute and checkpoint (see Checkpoint SPI for more details) management. Both, attributes and checkpoints, can be used from task itself and from the jobs belonging to this task. Session attributes and checkpoints can be set from any task or job methods. Session attribute and checkpoint consistency is fault tolerant and is preserved whenever a job gets failed over to another node for execution. Whenever task execution ends, all checkpoints saved within session with GridTaskSessionScope.SESSION_SCOPEdoc scope will be removed from checkpoint storage. Checkpoints saved with GridTaskSessionScope.GLOBAL_SCOPEdoc will outlive the session and can be viewed by other tasks.
The sequence in which session attributes are set is consistent across the task and all job siblings within it. There will never be a case when one job sees attribute A before attribute B, and another job sees attribute B before A. Attribute order is identical across all session participants. Attribute order is also fault tolerant and is preserved whenever a job gets failed over to another node.
11.4.2. Connected Tasks
Note that apart from setting and getting session attributes, tasks or jobs can choose to wait for a certain attribute to be set using any of the GridTaskSession.waitForAttribute(..) methods. Tasks and jobs can also receive asynchronous notifications about a certain attribute being set through GridTaskSessionAttributeListenerdoc listener. Such feature allows grid jobs and tasks remain connected in order to synchronize their execution with each other and opens a solution for a whole new range of problems.
Imagine for example that you need to compress a very large file (let’s say terabytes in size). To do that in grid environment you would split such file into multiple sections and assign every section to a remote job for execution. Every job would have to scan its section to look for repetition patterns. Once this scan is done by all jobs in parallel, jobs would need to synchronize their results with their siblings so compression would happen consistently across the whole file. This can be achieved by setting repetition patterns discovered by every job into the session. Once all patterns are synchronized, all jobs can proceed with compressing their designated file sections in parallel, taking into account repetition patterns found by all the jobs in the split. Grid task would then reduce (aggregate) all compressed sections into one compressed file. Without session attribute synchronization step this problem would be much harder to solve.
11.4.3. Session Injection
Session can be injected into a task or a job using IoC (dependency injection). See [Resources Injection] page for additional details.
11.4.4. Example
Below is a grid task implementation that is responsible for split and aggregate (a.k.a map/reduce) logic. Note that this implementation uses GridifyTaskSplitAdapterdoc that simplifies API for grid tasks in homogeneous grids (which is often the case). Main two methods that are implemented here are split and reduce. Method reduce aggregates results (sums up all numbers returned by jobs) to calculate length of initial string.
This task will split passed in string into separate word and then pass each word into its own job for execution on different nodes. Every job will do the following:
-
Execute grid-enabled method with argument passed in.
-
Add its argument to the session.
-
Wait for other jobs to add their arguments to the session.
-
Execute grid-enabled method with all session attributes concatenated into one string as an argument.
package org.gridgain.examples.helloworld.gridify.session;
import java.io.*;
import java.util.*;
import org.gridgain.grid.*;
import org.gridgain.grid.gridify.*;
import org.gridgain.grid.resources.*;
/**
* Grid task for {@link GridifyHelloWorldSessionExample} example. It handles spiting
* this example into multiple jobs for execution on remote nodes.
* <p>
* Every job will do the following:
* <ol>
* <li>Execute grid-enabled method with argument passed in.</li>
* <li>Add its argument to the session.</li>
* <li>Wait for other jobs to add their arguments to the session.</li>
* <li>Execute grid-enabled method with all session attributes concatenated into one string as an argument.</li>
* </ol>
*/
public class GridifyHelloWorldSessionTask extends GridifyTaskSplitAdapter<Integer> {
/** Grid task session will be injected. */
@GridTaskSessionResource
private GridTaskSession ses = null;
/**
* {@inheritDoc}
*/
@Override
protected Collection<? extends GridJob> split(int gridSize, GridifyArgument arg) throws GridException {
String[] words = ((String)arg.getMethodParameters()[0]).split(" ");
List<GridJobAdapter<String>> jobs = new ArrayList<GridJobAdapter<String>>(words.length);
for (String word : words) {
jobs.add(new GridJobAdapter<String>(word) {
/** Job context will be injected. */
@GridJobContextResource
private GridJobContext jobCtx = null;
/**
* Executes grid-enabled method once with all
* session attributes concatenated into string
* as an argument and again with passed in argument.
*/
public Serializable execute() throws GridException {
String word = getArgument();
// Set session attribute with value of this job's word.
ses.setAttribute(jobCtx.getJobId(), word);
try {
// Wait for all other jobs within this task to set their attributes on
// the session.
for (GridJobSibling sibling : ses.getJobSiblings()) {
// Waits for attribute with sibling's job ID as a key.
if (ses.waitForAttribute(sibling.getJobId()) == null) {
throw new GridException("Failed to get session attribute from job: " +
sibling.getJobId());
}
}
}
catch (InterruptedException e) {
throw new GridException("Got interrupted while waiting for session attributes.", e);
}
// Create a string containing all attributes set by all jobs
// within this task (in this case an argument from every job).
StringBuilder msg = new StringBuilder();
// Formatting.
msg.append("All session attributes [ ");
for (Serializable jobArg : ses.getAttributes().values()) {
msg.append(jobArg).append(' ');
}
// Formatting.
msg.append(']');
// For the purpose of example, we simply log session attributes.
log.info(msg.toString());
// Execute gridified method again and return the number
// characters in the passed in word.
return GridifyHelloWorldSessionExample.sayIt(word);
}
});
}
return jobs;
}
/**
* Sums up all characters from all jobs and returns a
* total number of characters in the initial phrase.
*
* @param results Job results.
* @return Number of letters for the word passed into
* {@link GridifyHelloWorldSessionExample#sayIt(String)} method.
* @throws GridException If reduce failed.
*/
public Integer reduce(List<GridJobResult> results) throws GridException {
int totalCharCnt = 0;
for (GridJobResult res : results) {
// Every job returned a number of letters
// for the phrase it was responsible for.
Integer charCnt = res.getData();
totalCharCnt += charCnt;
}
// Account for spaces. For simplicity we assume one space between words.
totalCharCnt += results.size() - 1;
// Total number of characters in the phrase
// passed into task execution.
return totalCharCnt;
}
}
11.5. Zero Deployment
TODO
11.6. Task Resources
TODO
11.7. GridNodeLocal
When working in distributed environment often you need to have a consistent local state per grid node that is reused between various job executions. For example, what if multiple jobs require database connection pool for their execution - how do they get this connection pool to be initialized once and then reused by all jobs running on the same grid node? Essentially you can think about it as a per-grid-node singleton service, but the idea is not limited to services only, it can be just a regular Java bean that holds some state to be shared by all jobs running on the same grid node.
Before GridGain 3.0 this approach was handled by using @GridUserResourcedoc annotation to annotate fields within GridTaskdoc or GridJobdoc classes to specify singleton beans. However, this approach was dependent on GridDeploymentModedoc configuration and, for ISOLATED or PRIVATE deployment modes, resource could be initialized multiple times, once per GridTask. This forced users to use various hacks in their logic and generally was not very convenient to use.
Starting with GridGain 3.0 GridNodeLocaldoc per-grid-node local cache was introduced. The name was borrowed from ThreadLocal class in Java, because just like ThreadLocal provides unique space per-thread in Java, GridNodeLocal provides unique space per-grid-node in GridGain. GridNodeLocal implements java.util.concurrent.ConcurrentMap interface and is absolutely lock-free. In fact, it simply extends java.util.concurrent.ConcurrentHashMap implementation and, therefore, inherits all the methods available there.
Here is an example of how GridNodeLocal could be used to create some user specific singleton connection pool from a simple GridGain job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | final Grid grid = G.start(..);
...
// Execute runnable job on some remote grid node.
grid.run(GridClosureCallMode.BALANCE, new Runnable() {
public void run() {
GridNodeLocal<String, MySingletonConnectionPool> nodeLocal = grid.nodeLocal();
// 1. First see if someone already stored connection pool in node-local storage.
MySingletonConnectionPool pool = nodeLocal.get("connPool");
if (pool == null) {
// 2. Create new connection pool and store it in node-local storage.
MySingletonConnectionPool old = pool.putIfAbsent("connPool", pool = new MySingletonConnectionPool(..));
if (old != null)
pool = old;
}
// Perform operations with connection pool.
...
}
});
|
11.8. Failover
TODO
11.9. Collision Resolution
TODO
11.10. Topology Management
TODO
11.11. Load Balancing
11.11.1. Overview
In MapReduce pattern the mapping is a process of splitting the initial task into sub-tasks and assigning them to the grid nodes. Mapping generally involves the splitting logic itself, mapping sub-tasks to the nodes including load balancing, and potential failover and collision resolution. In conventional approach the worker nodes pull the sub-tasks for execution. In GridGain, sub-tasks are pushed to the worker nodes and this process is initially controlled by the task. The later has fundamental advantage that was largely missing in grid computing frameworks before GridGain.
|
|
GridGain approach of giving task the control of sub-task distribution enables early and late load balancing algorithms. This effectively helps to adapt task execution to non-deterministic nature of execution on the grid. Not having this capability significantly narrows deployment options where optimal performance and scalability can be achieved. |
11.11.2. Early And Late Load Balancing
The sequence of steps described below shows when Early and Late load balancing policies come into play:
-
Someone calls one of GridProjection.execute(..) methods passing grid task and its argument to initiate grid task execution in the system.
-
Method GridTask.map(..) will be called on the task to perform the initial mapping. This method is responsible for taking a task, splitting it into number of sub-tasks and mapping every sub-task with one or more grid nodes. This method returns set of sub-task:node pairs. This is what we call Early Load Balancing as it is done right during initial mapping operation and with only information available at the execution initiation time (see Load Balancing SPI documentation).
-
Once mapping is done the sub-tasks will travel to respective remote nodes for execution.
-
When sub-task arrives to the destination grid node it will be subject for collision (scheduling) resolution via Collision SPI. This SPI is called every time when new sub-task arrived, existing sub-task finished its execution or a metrics update is received (with every heartbeat). Collision SPI looks into the queue of its sub-tasks (including a newly received one, if any) and can either cancel sub-task, leave it waiting in the queue, transfer it to another node for execution, or start its execution locally. This is what we call Late Load Balancing. This load balancing happens later in the process of execution and it happens on destination node right where sub-task is about to get executed.
The important characteristic of the late load balancing is that there can be a significant time difference between mapping (early load balancing) and actual time when execution of the sub-task commences on the remote node - and late load balancing allows to account for this non-deterministic aspect of grid execution and potentially re-balance the sub-task on the grid.
For example, our Job Stealing Collision SPI does exactly that. It monitors number of queued sub-tasks on each node and preemptively moves waiting sub-tasks from "busy" node to the "idle" node for execution.
Load balancing capabilities in GridGain are more of the advanced features and not everyone would need them. For example, in homogeneous grid with homogeneous tasks load balancing achieved naturally. However, in many other cases when conditions are more real-life - sophisticated load balancing capabilities are about the only way to get the most out of your grid.
For more information on MapReduce refer to Map/Reduce: Simplified Data Processing on Large Clusters article from Google.
Early Load Balancing
Load balancing is a simple process of the optimal assignment of jobs to the nodes where these jobs to be executed. As almost all kernel level functionality in GridGain the load balancing is designed as SPI (Service Provider Interface). It consists of the public SPI and several implementations. Number of pre-built implementations are shipped with GridGain and user can develop one easily.
Load balancing SPI provides the next best balanced node for job execution. This SPI is used either implicitly or explicitly whenever a job gets mapped to a node during GridTask.map(..) invocation
This load balancing is usually referred as early load balancing as it happens early in the process of the grid task execution during mapping phase of MapReduce process. Note that late load balancing happens during collision resolution and is handled by Collision SPI.
Late Load Balancing
Grid jobs are said to be in collision when a job arrives onto node that already has one or more jobs either waiting or executing on it. Job collision resolution provides means to resolve this collision by basically allowing to:
-
put newly arrived job into the waiting queue
-
schedule it for immediate execution
-
cancel it (and preempt it by failing it over to another node)
-
wake up already waiting job from the queue and schedule it for immediate execution
As almost any kernel level functionality in GridGain collision is designed as SPI (Service Provider Interface). It consists of the public API and several implementations. As always, several pre-built implementations are shipped with GridGain and available for the developer - and custom ones can be easily built.
Collision SPI allows to regulate how grid jobs get executed when they arrive on a destination node for execution. In general a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution on it. There are multiple possible strategies dealing with this situation: all jobs can proceed in parallel, or jobs can be serialized i.e., only one job can execute in any given point of time, or only certain number or types of grid jobs can proceed in parallel, etc.
Collision SPI doesn’t expose any public APIs and works implicitly behind the scenes. As with any SPI, developer can provide its own implementation and plug it into GridGain.
Collision is generally referred as late load balancing as it happens late in the execution process when job has already arrived onto destination node. In fact, it allows to load balance jobs in the context of the given node. Note that early load balancing handled by Load Balancing SPI and occurs during initial mapping phase of MapReduce process.
11.12. AOP-Based Grid-Enabling
TODO
11.13. Closure Execution
TODO
11.14. Executor Service
TODO
11.15. Cron-Based Scheduling
TODO
11.16. Remote Actors vs. GridGain and Concurrency Unification
This is a bit of off-topic chapter discussing the differences between popular Actors concept (remote actors specifically) and functionality provided by GridGain. I was convinced to write about it after I got questions on Actors vs. GridGain Scalar at almost every conference I spoke about our Scalar DSL.
When talking about Actors there is always a bigger topic of Concurrency Unification trend that aims to combine principles of local multithreading concurrency and distributed programming. We at GridGain are strong supporters of concurrency unification. The example later on in this chapter will show some of our current work in this direction.
|
|
actor vs. Actor
We use lowercase actor to denote an instance of actor class or type, and uppercase Actor to denote the concept of actors. |
Back to Actors… After my presentation at GeeCON in the spring of 2011 I got the email asking for GridGain version of Pi-calculation example from Akka, a very popular and deservingly so, Actor framework in Scala. The sender was asking for help to compare Akka/Scala actors approach to basic distributed programming and GridGain Scalar’s approach.
I haven’t seen the Akka’s example before so I first downloaded the Akka 1.1 and looked at it…
Now, before we get to it I want to re-iterate few points related to Actor-based concurrency (note that these points are implementation-agnostic and apply equally to Scala actors or Akka actors).
I believe that Actors is an important "new" abstraction for elegantly resolving multithreading concurrency. I’m not, however, subscribing to an idealistic view that they are drop-in replacement for threads and java.util.concurrent utilities. Most of the real-life examples and applications that I’ve seen use all of these mechanisms together with actors.
|
|
Actors
I believe that Actors is an important "new" abstraction for elegantly resolving multithreading concurrency. |
It is often repeated that Actors do work best when they are used throughout the application - not just for solving one particular synchronization problem - but to build the entire subsystem or a module based on Actors. I tend to agree. Mixing and matching shared-state concurrency with actors produce rather awkward combination and significantly negates any advantages Actors bring.
There are, of course, many use cases where Actors simply don’t work well. Anytime you need a fine grain control, general performance fine-tuning or determinism on threading, or when you need more sophisticated locking algorithms (read/write, counting critical sections, etc.), or shared state is unavoidable - Actors tend to produce more verbose and less flexible solution. For example, I’ve seen several times attempts to implement pseudo-semaphore synchronization on the group of actors - and this was rather ugly.
Now, despite my positive outlook about general Actors for better concurrency management I have more reservations about Remote Actors - i.e. applying the same Actors concept in the distributed context.
11.16.1. Remote Actors and Concurrency Unification
Remote Actors basically allow to exchange messages between actors in different JVMs.
One of the major appeal of remote actors is that they attempt to bridge local JVM multithreading and distributed concurrency. At the first glance it seems rather elegant to extend the share-nothing message passing metaphor into distributed context to provide long sought-after Concurrency Unification.
The obvious contention is that in the distributed systems:
-
State is by default not shared by the same JVM
-
Not shared state exposed to parallel access from multiple JVMs
-
Data is already passed as serialized messages
|
|
Actors In Distributed Context
The key features of actors in local JVM multithreading are already present in distributed systems. |
Yet, distributed systems introduce the host of their own challenges comparatively to local JVM multithreading (JVM-M):
-
much larger latencies
-
cost of message passing is not negligible anymore and can easily exceed the processing time
-
resource starvation and deadlocking due to conditions not present in JVM-M
-
topology management & discovery
-
heterogenous environment (different CPUs, number of cores, memory sizes, OSes, networking, language runtime, etc.)
-
failover is very different conceptually from JVM-M
-
distributed load balancing not present in JVM-M
-
data sharing (a.k.a In-Memory Data Grid) is fundamentally different from sharing data in JVM-M
-
compute sharing (a.k.a. Compute Grid) is fundamentally different from sharing computations in JVM-M
-
deployment and provisioning of the code is dramatically different from JVM-M
It should be pretty obvious that challenges of distributed systems is far wider and more complex in nature than local JVM multithreading and therefore it makes more sense to adopt distributed practices to local JVM multithreading (and not vice verse) to gain true Concurrency Unification. In fact, you need to design from more general to more specific APIs when attempting to unify two related concepts.
11.16.2. Back To Example
So, here is the example of Pi calculation verbatim from the Akka 1.1 tutorial. I took the liberty to remove some excessive comments to make the code shorter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | package akka.tutorial.first.scala
import akka.actor.{Actor, PoisonPill}
import Actor._
import akka.routing.{Routing, CyclicIterator}
import Routing._
import java.util.concurrent.CountDownLatch
object Pi extends App {
calculate(nrOfWorkers = 4, nrOfElements = 10000, nrOfMessages = 10000)
sealed trait PiMessage
case object Calculate extends PiMessage
case class Work(start: Int, nrOfElements: Int) extends PiMessage
case class Result(value: Double) extends PiMessage
class Worker extends Actor {
// define the work
def calculatePiFor(start: Int, nrOfElements: Int): Double = {
var acc = 0.0
for (i <- start until (start + nrOfElements))
acc += 4.0 * (1 - (i % 2) * 2) / (2 * i + 1)
acc
}
def receive = {
case Work(start, nrOfElements) =>
self reply Result(calculatePiFor(start, nrOfElements)) // perform the work
}
}
class Master(nrOfWorkers: Int, nrOfMessages: Int, nrOfElements: Int, latch: CountDownLatch)
extends Actor {
var pi: Double = _
var nrOfResults: Int = _
var start: Long = _
// create the workers
val workers = Vector.fill(nrOfWorkers)(actorOf[Worker].start())
// wrap them with a load-balancing router
val router = Routing.loadBalancerActor(CyclicIterator(workers)).start()
// message handler
def receive = {
case Calculate =>
// schedule work
for (i <- 0 until nrOfMessages) router ! Work(i * nrOfElements, nrOfElements)
// send a PoisonPill to all workers telling them to shut down themselves
router ! Broadcast(PoisonPill)
// send a PoisonPill to the router, telling him to shut himself down
router ! PoisonPill
case Result(value) =>
// handle result from the worker
pi += value
nrOfResults += 1
if (nrOfResults == nrOfMessages) self.stop()
}
override def preStart() {
start = System.currentTimeMillis
}
override def postStop() {
// tell the world that the calculation is complete
pri
|