PRODUCT      SERVICES      CUSTOMERS      PARTNERS      COMPANY
spacer HOME > PRODUCT > COMPUTATIONAL GRIDS  
spacer spacer

Computational Grids

Grid computing is usually defined by two broad categories: data grids and computational grid (also sometimes called a compute grid). In a nutshell, data grids distribute data and computational grids distribute processing. Both types of grid computing are trying to improve the performance and/or scalability of the application running on the grid.

GridGain is a computational grid framework.

Computational grids are sometime mischaracterized as applicable only for calculation or computational tasks. In fact, however, you can just replace word "computational" with word "processing" and you will get a clearer meaning. Word "computational" traditionally stuck with this type of grid computing due to historical context from which grid computing has originated, namely High Performance Computing (HPC) and clustered computing that has been used initially in academic quantitative research.

 Top

Split/Aggregate

Like in traditional HPC systems (such as MPI) the trademark of computational grids is ability to split process into a set of sub-processes, execute them in parallel, and aggregate sub-results into the one final result.

Map/Rreduce (a.k.a as split and aggregate) design allows to parallelize the process of task execution gaining performance and/or scalability: with two nodes in the grid you can achieve up to 2 times performance increase, with 3 nodes - up to 3 times performance increase, and so on. For a example, with simple and inexpensive 10-nodes grid you can often achieve "an order of magnitude" performance increase for your applications.

In GridGain a grid task is a deployable unit of work that is responsible for initial splitting (mapping) and aggregation (reducing), and a grid job is an executable unit of work that executes on the remote node.

Following diagram illustrates the basics of split and aggregate logic:

  1. Task execution request
  2. Task splits (maps) into jobs
  3. Result of job execution
  4. Aggregation (reduction) of job results into task result

 Top

Scalability

There is a general wide spread confusion on how grid computing, and specifically split/aggregate technique, can help scalability and performance of enterprise applications. Subconsciously we can all feel that using grid computing improves scalability and reduce latencies - but few of us can clearly articulate how...

Let's first define what scalability and performance generally mean. Scalability means ability to process more transactions without increasing overall system response time. Performance or transaction latency generally means how fast the system can process a single transaction under a certain load.

It's important to note that low latency doesn't mean better scalability and highly scalable systems can have horrible performance. Each particular system will exhibit certain pattern on how these two metrics correlate.

In the same time there is a significant confusion over the difference between Master/Worker and Split/Aggregate. Although the difference may seem benign, I think it's important nonetheless. Split/Aggregate design pattern, unlike Master/Worker, concentrates on a notion of splitting data or computation into multiple sub-elements rather than on a notion of remote execution.

There are two major types of grid computing: data grids and computational grids. Data grids allow splitting (partitioning) data and computational grids split (parallelize) processing. When using data partitioning exclusively you usually can only achieve better scalability. Why? Because due to partitioning you can process more transactions in the same period of time since each grid node processes a subset of data in parallel. But if the processing logic remains unchanged you can't significantly improve the performance, i.e. how fast a single transaction gets processed.

That's where computational grids come to play. Computational grids provide the capability to split or parallelize the execution of a single transaction leading to dramatically improved performance and reduced latencies. What's even more is that computational grids usually lead to a better scalability since you can now fine tune the data partitioning and resources utilization.

Data and computational grids are very different in how they approach their tasks. Their designs usually vary dramatically. Yet - you will need both if you want to achieve the real grid effect for you application.

In GridGain we provide what we believe the best computational grid framework for Java. But we've chosen not to re-implement data grids but rather integrate with all the leading solutions - and we do with Coherence, GigaSpaces and JBoss Cache. Most of the time no integration is really required as data grids are substantially different in design and can be easily used just along side with computational grids.

 Top

spacer