The GridGain Systems In-Memory Computing Blog

Excellent paper released by researchers at University of California, Berkeley . They have analyzed data from Hadoop installation at Facebook (one of the largest as such in the world) looking at various metrics for Hadoop jobs running at Facebook datacenter that has over 3,000 computers dedicated to Hadoop-based processing. They have come up with very interesting insights. I advise everyone…
In this article I’ll introduce the concept of Streaming MapReduce processing using GridGain and Scala. The choice of Scala is simply due to the fact that it provides for very concise notation and GridGain provides very effective DSL for Scala. Rest assured you can equally follow this post in Java or Groovy just as well. The concept of streaming processing (and Streaming MapReduce in particular…
In-memory processing is becoming a business necessity in a similar way as collecting and processing ever increasing data sets (a.k.a Big Data) has become a business "must have" rather than just a simple technology in the last five years. Both of these trends are intervened in an interesting ways. Let me explain... 1. Storing Necessitates Processing The initial foray into BigData for many…
Few days ago I blogged about how GridGain easily supports starting many GridGain nodes in the single JVM - which is a huge productivity boost during the development. I've got a lot of requests to show the code - so here it is. This is an example that we are shipping with upcoming 4.3 release (entire source code): [source lang="java" title="Micro cloud example"] import org.gridgain.grid.*; import…
One of the features in GridGain's In-Memory Data Platform that often goes unspoken for is ability to launch multiple GridGain's node in the single JVM. Now, as trivial as it sounds... can you start multiple JBoss or WebLogic or Infinisnap or Gigaspaces or Coherence or (gulp) Hadoop 100% independent runtimes in the single JVM? The answer is no. Even for a simple test run you'll have to start…
Over the last 12 months I’ve accumulated plenty of “conversations” where we’ve discussed big data analytics and BI strategies with our customers and potential users. These 5 points below represent some of the key take-away points about current state of analytics/BI field, why it is by in large a sore state of affairs and what some of the obvious tell telling signs of the decay. Beware: some…
Recently, at one of the customer meetings, I was asked whether GridGain comes with its own database. Naturally my reaction was - why? GridGain easily integrates pretty much with any persistent store you wish, including any RDBMS, NoSql, or HDFS stores. However, then I thought, why not? We already have cache swap space (disk overflow) storage based on Google LevelDB key-value database…
Lately there has been lots of noise about "Real Time" Big Data. Lots of companies that associate themselves with this term are generally in analytical space and to them it really means "low-latency" for analytical processing of data which is usually stored in some warehouse, like Hadoop Distributed File System (HDFS). To achieve this they usually create in-memory indexes over HDFS data which…
GridGain 4.0.1 has been released this Monday. This is a point release that includes several bug fixes as well as number of new features. With 4.0.1 we are introducing native support for .NET with our C# Client. C# Client provides native .NET/C# APIs for accessing both GridGain's In-Memory Data Grid and Compute Grid from outside of the GridGain topology context. Internally it's deferring to…
We have promised a while back to publish the code from live coding GridGain presentation we did at QCon London earlier this year. Since presentation was in Scala, the code we will be posting here is in Scala.First a brief intro. We all know Hadoop's counting words example which takes a file with words and then produces another file with number of occurrences next to each word.…