I'm pleased to announce that today we released GridGain 4.0 - latest edition of our platform for Real Time Big Data processing. I'm proud that our team set this final deadline almost 5 months ago and we were able to hit without a single delay.
I'm especially proud of this fact because of the enormous complexity of the development process involved in making software like GridGain - dozens of production clients, testing on serious massively distributed environments, set of new features, and the usual array of setbacks that we had to go through to get here.
Needless to say that we have also grown significantly as a business in the last 6 months including more than doubling our team headcount, rolling out new website, new branding, sales team, messaging, press and analysts relationships, investment, and the whole scope of other business activities.
But... GridGain System is an engineering company first and foremost and I'll talk about technology in GridGain 4.0:
Visor Management & Monitoring
Enterprise and OEM Editions of GridGain comes standard with GridGain Visor – GUI-based and scriptable environment for managing and monitoring GridGain distributed installations.
Visor GUI allows to perform all major management and monitoring operations for GridGain installations:
Various Node Actions
Topology View with Metrics
Metrics For Any Projection
Comprehensive Historical Charts
Advanced Grid-Wide Events
... and plenty of other cool stuff!
Affinity-Aware Native Clients
In GridGain 4.0 we are finally introducing native clients for various languages. The 4.0 release includes native Java and Android clients with rich APIs to support our Compute Grid and Data Grid connectivity. Our native .NET, C++, Groovy, and Scala clients are already in testing stage and will be coming out shortly as well. After that we will be adding Objective-C, Ruby, PHP, Python, and Node.js native clients.
All of our clients natively support essentially the same APIs specifically adapted to a certain language. You can execute MapReduce tasks, perform bunch of data operations, like storing and retrieving values to/from remote caches, compare-and-set/replace/put-if-absent atomic operations, etc... You can also subscribe to topology updates and get very creative with partitioning remote data grid into logical subgrids.
But one of the coolest features in our clients is affinity-awareness. This basically means that when working with data grids, GridGain will automatically figure out on which node the data is stored and will route client requests to that node. Imagine the amount of network trips you can save by retrieving data directly from the node which is responsible for storing it (same goes for updates). This feature is available for all of our native clients, not only for Java, which makes GridGain into the only native cross -language distributed Real Time Big Data platform.
Memcached Binary Protocol Support
In GridGain 4.0 we significantly enhanced our REST support for HTTP(S) and added Binary protocol support as well. What's even cooler is that our binary protocol is fully Memcached-compliant. As a matter of fact, during our testing we have been connecting to GridGain using available open source Memcached clients and executing commands on GridGain data grid.
Having said that, GridGain 4.0 Binary connectivity protocol supports a lot more than Memcached does. Essentially we have taken Memcached protocol as our starting point and significantly enhanced it with our own commands and features. For example, you can configure security with proper authentication and secure sessions for remote clients, or you can execute MapReduce tasks, get remote node topology, etc...
In GridGain 4.0 we added a notion of secure grids. Grids can now request for nodes to be authenticated prior to joining them into topology. Authentication implementation is fully pluggable through our SPI-based architecture and comes with several implementations out of the box, such as Passcode or JAAS-based authentication.
Additionally remote clients can also be required to authenticate themselves and once authenticated, they establish a secure session with the server.
Both, authentication and secure-session SPIs are available in GridGain enterprise edition only.
Data Loaders + Hadoop HDFS Support
We have also added support for efficient concurrent data loading for our data grid. There are plenty of ways to load data into data grids, including using basic cache APIs or our support for bulk-loading of data from data stores. Data loaders make it easy to externally load data into grid by adding collocation with data nodes, sending concurrent data loading jobs and properly controlling the amount of memory consumed by data loading process.
As a good use case, you can use data loader to preload data from Hadoop HDFS in order to process it in Real Time on GridGain. In fact GridGain 4.0 comes with HDFS data loading example,
GridCacheHdfsLoaderExample, which reads data from HDFS and then uses GridGain data loader to load it into data grid.
1000+ Nodes Guaranteed Discovery
In this release we enhanced our TCP-based discovery protocol with enterprise-proven support for network segmentation and half-connected sockets. Our discovery protocol was tested on thousands of grid nodes on Amazon EC2 cloud by us and by our customers to make sure there are no cluster or data inconsistencies. This protocol is already running on customer sites in several deployments quite successfully.
LevelDB Swap Space Implementation
n GridGain 4.0 you can load terabytes of data into cache. GridGain will try to fit as much of the data in memory as possible - the more grid nodes you have, the more memory is available for caching data. However, if you have more data than fits into the whole memory of the grid, you can use LevelDB swap implementation which is based on Google LevelDB storage to swap infrequently used data to disk. We found that LevelDB can efficiently store large amounts of data with a fairly small disk footprint (using compression). We have also enhanced it with our swap eviction policy to prevent infinite disk growth.