In-Memory HPC key innovations:
High Performance Computing (HPC) systems exist to provide means for parallel processing of various CPU or otherwise resource intensive tasks. A common approach is to place any number of dedicated multi-core servers in close proximity to form a computing cluster.
Traditionally the HPC approach involves splitting tasks into smaller work units and distributing such work units within cluster for parallel execution. This approach generally results in linear scalability, as with addition of new processing power, new work units can be easily created and the overall task execution will become proportionally faster.
A good example scenario would be the use of Monte Carlo Simulations across a wide spectrum of industries. In the financial sector, for example, the Monte Carlo method is used to assess the value of companies, perform risk analysis, or calculate financial derivatives. The method relies on repeated random sampling by running simulations multiple times in order to calculate the same probabilities heuristically. Problems that fit the Monte Carlo method can be easily split into multiple sample ranges, with each sample distributed across the computational cluster for parallel execution.
GridGain In-Memory HPC is a real-time high performance distributed computation framework.
The goal of any HPC framework is to support as many different execution models as possible, providing the end-user with the widest set of options on how a particular algorithm can be implemented and ultimately executed in the distributed environment.
GridGain In-Memory HPC supports four distinct and yet highly integrated distributed execution models:
In-Memory HPC provides general distributed map-reduce type of processing optimized for in-memory. GridGain MapReduce type processing defines the method of splitting original compute task into multiple sub-tasks, executing these sub-tasks in parallel on any managed infrastructure and aggregating (a.k.a. reducing) results back to one final result.
It is worth mentioning that GridGain MapReduce is different from Hadoop MapReduce. If Hadoop MapReduce task takes input from disk, produces intermediate results on disk, and outputs result onto disk, GridGain does everything Hadoop does in memory – it takes input from memory via direct API calls, produces intermediate results in memory and then creates result in-memory as well.
GridGain also provides native support for classic MPP (Massively Parallel Processing) which largely refers to utilizing processors and their local memory in distributed environment whenever they become available. MPP systems are well balanced and usually scale very well, as all processors should be utilized equally and no node should bottleneck.
GridGain also provides native support RPC (Remote Procedure Call) type of processing including direct remote closure execution, unicast/broadcast/reduce execution semantic, shared distribution sessions and many other advanced features.
GridGain’s high performance distributed messaging provides MPI-style (i.e. message passing based distribution) processing capabilities. Built on proprietary asynchronous IO and one of the fastest marshaling algorithms, GridGain provides synchronous and asynchronous semantic, distributed events, and publish-subscribe messaging in a distributed environment.
Here is a list of some of the key execution services provided by GridGain In-Memory HPC edition:
To build truly scalable applications with real-time capabilities that are using large amounts of data, the integration of In-Memory Database with In-Memory HPC edition is crucial. This is one of the main ideas in distributed high performance systems dealing with large datasets. Such integration mainly involves collocation of computations and data and is one of the key concepts behind GridGain In-Memory Computing Platform.
The idea is pretty simple: if computation and data are not co-located, then it will arrive on some remote node and will have to fetch the necessary data from yet another node where the data is stored. Once processed this data most likely will have to be discarded (since it’s already stored and backed up elsewhere). This process induces an expensive network trip plus all associated marshaling and demarshaling routines. At scale – this behavior can bring almost any system to a halt.
Affinity co-location solves this problem by co-locating the job with its necessary data set. We say that there is an affinity between processing and the data that this processing requires – and therefore we can route the job based on this affinity to a node where data is stored to avoid unnecessary network trips and extra marshaling and demarshaling. GridGain provides advanced capabilities for affinity co-location: from a simple method call to sophisticated APIs supporting complex affinity keys and nontrivial topologies.
GridGain provides one of the most sophisticated clustering technologies on Java Virtual Machine (JVM) called Hyper Clustering™. This technology has been independently tested in production environments of over 2000 nodes. Its ability to connect and manage a heterogenous set of computing devices is at the core GridGain’s distributed processing capabilities. In-Memory HPC clustering capabilities are fully exposed to the developers that have full control with the following advanced features:
- Pluggable cluster topology management and various consistency strategies
- Pluggable automatic discovery on LAN, WAN, and AWS
- Pluggable “split-brain” cluster segmentation resolution
- Pluggable unicast, broadcast, and Actor-based cluster-wide message exchange
- Pluggable event storage
- Cluster-aware versioning
- Support for complex leader election algorithms
- On-demand and direct deployment
- Support for virtual clusters and node groupings
The zero deployment feature means that you don’t have to deploy anything on the grid – all code together with resources gets deployed automatically. This feature is especially useful during development as it removes lengthy Ant or Maven rebuild routines or copying of ZIP/JAR files. The philosophy is very simple: write your code, hit a run button in the IDE or text editor of your choice and the code will be automatically be deployed on all running grid nodes. Note that you can change existing code as well, in which case old code will be undeployed and new code will be deployed while maintaining proper versioning.
A secure grid is often a strong requirement in enterprises. In-Memory HPC provides two levels by which security is enforced: cluster topology and client connectivity. When cluster-level security is turned on, unauthenticated nodes are not allowed to join the cluster. When client security is turned on, remote clients will not be able to connect to the grid unless they have been authenticated.
Failover management and resulting fault tolerance is a key property of any HPC infrastructure. Based on its SPI-based architecture, GridGain provides totally pluggable failover logic with several popular implementations available out-of-the-box. Unlike other HPC frameworks GridGain allows failover of the logic and not only the data.
With grid task being the atomic unit of execution on the grid the fully customizable and pluggable failover logic enables developers to choose specific policy much the same way as one would choose concurrency policy in RDBMS transactions.
Moreover, GridGain allows you to customize the failover logic for all tasks, whether a group of tasks or every individual task. Using meta-programming techniques the developer can even customize the failover logic for each task execution.
This allows you to fine-tune how grid task reacts to the failure, for example:
- Fail entire task immediately upon failure of any of its jobs (fail-fast approach)
- Failover failed job to other nodes until topology is exhausted (fail-slow approach)
GridGain provides the ability to either directly or automatically select a subset of grid nodes (i.e. a topology) on which MapReduce tasks will be executed. This ability gives tremendous flexibility to the developer in deciding where a task will be executed. The decision can be based on any arbitrary user or system information. For example, time of day or day of the week, type of task, available resources on the grid, current or average stats from a given node or aggregate from a subset of nodes, network latencies, predefined SLAs, etc.
For cases when some grid nodes are more powerful or have more resources than others you can run into scenarios where nodes are not fully utilized or over-utilized. Under-utilization and over-utilization are both equally bad for a grid. Ideally, all grid nodes in the grid should be equally utilized. GridGain provides several ways to achieve equal utilization across the grid including:
Weighted Load Balancing
If you know in advance that some nodes are, for example, 2 times more powerful than others, you can attach proportional weights to the nodes. For example, part of your grid nodes would get weight of 1 and the other part would get weight of 2. In this case job distribution will be proportional to node weights and nodes with heavier weights will proportionally get more jobs assigned to them.
Adaptive Load Balancing
For cases where nodes are not equal and you don’t know exactly how different they are, GridGain will automatically adapt to differences in load and processing power and will send more jobs to more powerful nodes and less jobs to weaker nodes. GridGain achieves that by listening to various metrics on various nodes and constantly adapting its load balancing policy to the differences in load.
Early and Late Load Balancing
GridGain provides both early and late load balancing for HPC load distribution, effectively enabling full customization of the entire load balancing process. Early and late load balancing allows adapting the grid task execution to non-deterministic nature of execution on the grid.
Early load balancing is supported via mapping operation of the MapReduce process. The mapping – the process of mapping jobs to nodes in the resolved topology – happens right at the beginning of task execution and therefore it is considered to be an early load balancing stage. Once jobs are scheduled and have arrived on the remote nodes for execution they get queued up on the remote node. How long this job will stay in the queue and when it’s going to get executed is controlled by the job collision resolution – that effectively defines the late load balancing stage.
One implementation of the load balancing orchestrations provided out-of-the-box is a job stealing algorithm. This detects imbalances at a late stage and sends jobs from busy nodes to the nodes that are considered free right before the actual execution.
Grid and cloud environments are often heterogeneous and non-static, tasks can change their complexity profiles dynamically at runtime and external resources can effect execution of the task at any point. All these factors underscore the need for proactive load balancing during initial mapping operation as well as on destination nodes where jobs can be in waiting queues.
ollision resolution allows you to regulate how grid jobs get executed when they arrive on a destination node for execution. Its functionality is similar to task management via customizable GCD (Great Central Dispatch) on Mac OS X as it allows developers and administrators to provide custom job dispatching on a single node. In general, a grid node will have multiple jobs arriving to it for execution and potentially multiple jobs that are already executing or waiting for execution. There are multiple possible strategies dealing with this situation, such as, all jobs can proceed in parallel, jobs can be serialized, one job can execute in any given point of time, or only a certain number or types of grid jobs can proceed in parallel.
A distributed task session is created for every task execution and allows for sharing state between different jobs within the task. Jobs can add, get, and wait for various attributes to be set, which allows grid jobs and tasks to remain connected in order to synchronize their execution with each other and opens a solution to a whole new range of problems.
Imagine, for example, that you need to compress a very large file. To do that in a grid environment you would split such file into multiple sections and assign every section to a remote job for execution. Every job would have to scan its section to look for any repetition. Once this scan is done by all jobs in parallel, jobs would need to synchronize their results with their siblings so compression would happen consistently across the whole file. This can be achieved by setting repetition patterns discovered by every job into the session.
When working in a distributed environment often you need to have a consistent local state per grid node that is reused between various job executions. For example, what if multiple jobs require a database connection pool for their execution – how do they get this connection pool to be initialized once and then reused by all jobs running on the same grid node? Essentially you can think about it as a per-grid-node singleton service, but the idea is not limited to services only, it can be just a regular Java bean that holds some state to be shared by all jobs running on the same grid node.
In addition to running direct MapReduce tasks on the whole grid or any user-defined portion of the grid, you can schedule your tasks to run repetitively as often as you need. GridGain supports Cron-based scheduling syntax for the tasks, so you can schedule your tasks to run using the familiar standard Cron syntax.
Checkpointing a job provides the ability to periodically save its state. This becomes especially useful in combination with fail-over functionality. Imagine a job that may take 5 minutes to execute, but after the 4th minute the node on which it was running has crashed. The job will be failed over to another node, but it would usually have to be restarted from scratch and would take another 5 minutes.
However, if the job was check-pointed every minute, then the most amount of work that could be lost is the last minute of execution and upon failover the job would restart from the last saved checkpoint. GridGain allows you to easily checkpoint jobs to better control overall execution time of your jobs and tasks.
Continuations are useful for cases when jobs need to be suspended and their resources need to be released. For example, if you spawn a new task from within a running job, it would be wrong to wait for that task completion synchronously because the job thread will remain occupied while waiting, and therefore your grid may run out of threads. The proper approach is to suspend the job so it can be continued later, for example, whenever the newly spawned task completes.
This is where GridGain continuations become really helpful. GridGain allows users to suspend and restart their jobs at any point. So in our example, when a remote job needs to spawn another task and wait for the result, our job would spawn the task execution and then suspend itself. Then, whenever the new task completes, our job would wake up and resume its execution. Such approach allows for easy task nesting and recursive task execution. It also allows you to have a lot more cross-dependent jobs and tasks in the system than there are available threads.
Service Provider Interface (SPI) architecture is at the core of GridGain including In-Memory HPC. It allows you to abstract various system level implementations from their common reusable interfaces. Essentially, instead of hard coding every decision about internal implementation about the product, GridGain instead exposes a set of interfaces that define the GridGain’s internal view on its various subsystem. Users then can use either provided built-in implementations or roll out their own when they need different functionality.
GridGain provides SPIs for 14 different subsystems all of which can be freely customized:
Having ability to change the implementation of each of these subsystems provides tremendous flexibility to how GridGain can be used in a real-world environment. Instead of being “King of the Hill” and demand that other software would accommodate GridGain, GridGain software blends naturally in almost any environment and integrates easily with practically any host eco-system.
GridGain In-Memory HPC (as well as GridGain In-Memory Database) comes with a number of Remote Client APIs that allow you to remotely connect to the GridGain cluster. Remote Clients come for multiple programming languages including Java, C++, REST and .NET C#. These Clients provide a rich set of functionality that can be used without a client runtime being part of the GridGain cluster: run computational tasks, access clustering features, perform affinity-aware routing of tasks, or access In-Memory Database directly or via Memcached protocol.
GridGain In-Memory HPC, as any other GridGain platform product, comes with a comprehensive and unified GUI-based management and monitoring tool called GridGain Visor. It provides deep operations, management and monitoring capabilities.
A starting point of the Visor management console is the Dashboard tab which provides an overview on grid topology, many relevant graphs and metrics, as well as event panel displaying all relevant grid events:
To manage and monitor grid tasks and jobs there is a Compute tab which displays various metrics and routing information for tasks and jobs currently running in the grid:
Click here for more information about Visor.