Blog

Archives

Few days ago I blogged about how GridGain easily supports starting many GridGain nodes in the single JVM – which is a huge productivity boost during the development. I’ve got a lot of requests to show the code – so here it is. This is an example that we are shipping with upcoming 4.3 release (entire source code):

import org.gridgain.grid.*;
import org.gridgain.grid.spi.discovery.tcp.*;
import org.gridgain.grid.spi.discovery.tcp.ipfinder.*;
import org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.*;
import org.gridgain.grid.typedef.*;

import javax.swing.*;
import java.util.concurrent.*;

public class GridJvmCloudExample {
  /** Number of nodes to start. */
  private static final int NODE_COUNT = 5;

  /**
   * Starts multiple nodes in the same JVM.
   */
  public static void main(String[] args) throws Exception {
    try {
      ExecutorService exe = new ThreadPoolExecutor(
        NODE_COUNT, 
        NODE_COUNT, 
        0, 
        TimeUnit.MILLISECONDS,
        new LinkedBlockingQueue<Runnable>()
      );

      // Shared IP finder for in-VM node discovery.
      final GridTcpDiscoveryIpFinder ipFinder = 
        new GridTcpDiscoveryVmIpFinder(true);

      for (int i = 0; i < NODE_COUNT; i++) {
        final String nodeName = "jvm-node-" + i;

        // Start nodes concurrently (it's faster).
        exe.submit(new Callable<Object>() {
          @Override public Object call() throws Exception {
            // All defaults.
            GridConfigurationAdapter cfg = new GridConfigurationAdapter();

            cfg.setGridName(nodeName);

            // Configure in-VM TCP discovery so we don't
            // interfere with other grids running on the same network.
            GridTcpDiscoverySpi discoSpi = new GridTcpDiscoverySpi();

            discoSpi.setIpFinder(ipFinder);

            cfg.setDiscoverySpi(discoSpi);

            G.start(cfg);

            return null;
          }
        });
      }

      exe.shutdown();

      exe.awaitTermination(20, TimeUnit.SECONDS);

      // Get first node.
      Grid g = G.grid("jvm-node-0");

      // Print out number of nodes in topology.
      X.println("Number of nodes in the grid: " + g.nodes().size());

      // Wait until Ok is pressed.
      JOptionPane.showMessageDialog(
        null,
        new JComponent[] {
          new JLabel("GridGain JVM cloud started."),
          new JLabel("Press OK to stop all nodes.")
        },
        "GridGain",
        JOptionPane.INFORMATION_MESSAGE);
    }
    finally {
      G.stopAll(true);
    }
  }
}

That’s all there’s to it. Enjoy your in-JVM micro cloud!

One of the features in GridGain’s In-Memory Data Platform that often goes unspoken for is ability to launch multiple GridGain’s node in the single JVM.

Now, as trivial as it sounds… can you start multiple JBoss or WebLogic or Infinisnap or Gigaspaces or Coherence or (gulp) Hadoop 100% independent runtimes in the single JVM? The answer is no. Even for a simple test run you’ll have to start multiple instances on your computer (or on multiple computers), and debug this via remotely connected debugger, different log windows, different configurations, etc. In one word – awkward…

Not so with GridGain. As I mentioned – you can launch as many GridGain nodes as you need in the single JVM. As the matter of fact – this is exactly how we debug complex internal algorithms here at GridGain System when we develop our product. We launch entire micro cloud of 5-10 nodes in the single JVM (right from our JUnits), put breakpoints into different nodes and walk in debugger through complex distributed execution path… never leaving the convenience of your local IDE’s debugger.

Naturally, all functionality (your tasks, access to data grid, clustering, messaging, CEP, etc.) work exactly the same way in a single JVM as it would on separate physical computers. You don’t have to adjust or reconfigure anything!

Now that’s the a productivity feature that most of us can appreciate.

GridGain will be hosting a webinar Thursday, July 26, 2012 at 3pm EST / 12pm PST during which GridGain’s CTO, Dmitriy Setrakyan, will be live coding GridGain examples in Java.

Dmitriy will cover the following examples:

  • How to distribute simple units of work to the grid.
  • Collocation of computation and data.
  • A full Streaming MapReduce example that performs SQL queries on streaming in-memory data.

This is a fantastic opportunity to see how easy it is to get started with GridGain, so register now and join us for “Live Coding In-memory Bid Data with GridGain.”

Over the last 12 months I’ve accumulated plenty of “conversations” where we’ve discussed big data analytics and BI strategies with our customers and potential users. These 5 points below represent some of the key take-away points about current state of analytics/BI field, why it is by in large a sore state of affairs and what some of the obvious tell telling signs of the decay.

Beware: some measure of hyperbole is used below to make the points more contrast…

“Batch”

This is probably getting obvious for the most of industry insiders but still worth while to mention. If you have “batch” process in your big data analytics – you are not processing live data and you are not processing it in real time context. Period.

That means that you are analyzing stale data and your competitors that are more agile and smart are running circles around you since they CAN analyze and process live (streaming) data in real time and make appropriate operational BI decisions based on real time analytics.

Having “batch” in your system design is like running your database off the tape drive. Would you do that when everyone around you using disk?

“Data Scientist”

A bit controversial. But… if you need one – your analytics/BI are probably not driving your business since you need a human body between your data and your business. Having humans (that sadly need to eat and sleep) paints any process with huge latency, and non real time characteristics.

In most cases it simply means:

  • Data you are collecting and the system that is collecting it are so messed up that you need a Data Scientist (i.e. Statistician/Engineer below 30) to clean up this mess
  • You process is hopelessly slow and clunky for real automation
  • You analytics/BI is outdated by definition (i.e. analyzing stale data with no meaningful BI impact on daily operations)

Now, sometime you need a domain expert to understand the data and come up with some modeling – but I’ve yet to see a case complex enough that a 4 year engineer degree in CS could not solve. Most of the time it is overreaction/over hiring as the result of not understanding the problem in the first place.

“Overnight”

It’s a little brother of “Batch”. It is essentially a built-in failure for any analytics or BI. In the world of hyper local advertising, geo locations, up to the seconds updates on Twitter or Facebook or LInkedIn – you are the proverbial grandma driving 1966 Buick with blinking turn light on the highway with everyone speeding past you…

There’s simply no excuse today to have any type of overnight processing (except for some rare legacy financial applications). Overnight processing is not only a technical laziness but it is often a built-in organizational tenet – and that’s what makes it even more appalling.

“ETL”

This is a little brother of “Overnight”. ETL is what many people blame for overnight processing… “Look – we’ve got to move this Oracle into Hadoop and it takes 6 hours, and we can only do it at night when no one is online”.

Well, I can really count two or three clients of ours where no one is online during the night. This is 2012 for god’s sake!!! Most businesses (even smallish startups) are 24/7 operations these days.

ETL is a clearest sign of significant technical debt accumulation. It is for the most parts manifestation of defensive and lazy approach to system’s design. It is especially troubling to see this approach in newer, younger companies that don’t have 25 years of legacy to deal with.

And it is equally invigorating to see it being steadily removed in the companies with 50 years of IT history.

“Petabyte”

This is a bit controversial again… But I’m getting a bit tired to hear that “We must design to process Petabytes of data” line from 20 people companies.

Let me break it:

  • 99.99% of the companies will NEVER need Petabytes scale
  • If your business “needs” to process Petabytes of data for its operations – you are likely doing something really wrong
  • Most of the “working sets” that we’ve seen, i.e. the data you really need to process, measure in low teens of terabytes for absolute majority of use cases
  • Given how frequently data is changing (in its structure, content, usefulness, fresh-ness, etc.) I don’t expect that “working set” size will grow nearly as fast (if at all) – overall data amount will grow but not the actual “window” that we need to process…

Yes – there are some companies and government organizations that will have a need for historical archival reasons to store petabytes and exabytes of data – but it’s for historical, archival and backup reasons in all of those rare cases – and likely never for frequent processing.

GridGain will be presenting at Boston Area Scala Meetup on “BigData Distributed HPC with GridGain and Scala”. Good discussion about in-memory data platform in general, BigData and what role in-memory technology plays in BigData. As always – plenty of live coding developing live MapReduce apps in front of the audience.

All details at Boston Area Scala Meetup website.

Hope to see you there tomorrow!

Join GridGain for a 30-minute webinar, Friday, June 29, 2012, 11am to 11:45am Pacific (2pm to 2:45pm Eastern): “Achieving Fast Data with GridGain’s In-Memory Big Data Technology”

Do you need to process live data streams in real time? Are you looking for a way to more quickly fuse, query and run computations across static or warehoused data you already have in Hadoop and other databases? Would your organization gain competitive advantage by getting answers from your data within seconds or less, as opposed to minutes, hours or days?

Big Data means much more than the ability to store lots of information. Hadoop and numerous new and “NOSQL” data technologies are already addressing this requirement. The challenge for industry leading companies and new ventures today, however, is how quickly they can achieve insights, decisions and intelligence from Big Data.

Join GridGain this Friday, June 29, 2012, as we discuss this topic. Jon Webster, GridGain’s senior director of business development, and Dmitriy Setrakyan, GridGain co-founder and CTO, will share how GridGain can help your organization turn Big Data into Fast Data. The 30 minute presentation will include ample time for Q&A (15 to 30 minutes).

GridGain technologies provide the leading solution to this challenge, helping organizations add in-memory data and compute grids as a layer above their existing infrastructure, enabling queries and advanced computational tasks on all their important data – orders of magnitude faster than cache and disk-based operations.

Global industry analyst firm, Gartner, describes in-memory platforms as a “top 10″ disruptive technology for 2012 and beyond. Are these technologies right for your organization? Spend 30 minutes with GridGain, and see for yourself.

Click here to register now! Spaces are limited.


GridGain In-Memory Application ArchitectureFoster City, CA — Monday, June 25, 2012 — As increasing numbers of industry leading companies look to new technologies to speed their real-time services and Big Data applications, GridGain Systems, leading developer of scalable, distributed in-memory compute and data grid technologies for real-time data processing, is strongly positioned to serve the demand.

This week, GridGain announced the release of an entirely new suite of solution-focused products built around the company’s leading in-memory and world’s fastest MapReduce technologies, expanded enterprise consulting and other professional services, advanced DevOps management and monitoring capabilities now available with all products, and out-of-box integration with virtually any existing data source, Hadoop-based systems, and Java, C++, .NET, Android and iOS applications.

Specifically related to GridGain’s new products, the company now offers three which are built around its in-memory data and compute grid technologies. GridGain’s “Data Grid” provides distributed in-memory data caching and storage for exponentially faster access to any information. Their “Compute Grid” enables highly distributed and massively parallel computations and processing on virtually any amount of data, and also leverages the company’s data grid technology. The company’s “In-Memory Big Data Edition” product includes both Data Grid and Compute Grid products, along with Hadoop (HDFS, Zookeeper, HBase) integration and high performance data loading components. All GridGain products include comprehensive enterprise DevOps management tools, and “zero deployment” options compatible with all major cloud platforms.

GridGain has already helped more than 500 enterprise customers achieve entirely new levels of value for their end-users, and thousands of other organizations that have downloaded GridGain’s free, open source Community Edition. GridGain technologies are used in applications where data and computation must be highly available, distributed and scalable, with extremely low-latency. The company offers Java-based middleware used for transactional in-memory processing of terabytes to petabytes of data, while enabling on-demand scaling of data storage and computational power from one server to thousands of machines – on virtually any cloud or grid-enabled infrastructure, commodity hardware or high speed flash arrays.

GridGain’s founder and CEO, Nikita Ivanov, says “GridGain is being used to build systems that perform better, much better than traditionally architected systems – and not by percentages, but by significant orders of magnitude. Our customers are achieving far greater value as a direct result of this faster performance.”

Ivanov continues, “GridGain is helping companies gain significant advantage in their respective markets, enabling them to get faster answers from their data. We are very excited about this new line of products, expanded capabilities and consulting services – all of which strongly support the vision and the objectives of our enterprise and leading customers.”

About GridGain Systems

GridGain develops scalable, distributed, in-memory compute and data grid technologies for real-time data processing. The company’s Java-based middleware products enable development of applications and services that can instantly access terabytes to petabytes of information from any data source or file system, distribute computational tasks across any number of machines, and produce results orders of magnitude faster than traditionally architected systems. GridGain’s customers include innovative web and mobile businesses, leading Fortune 500 companies, and top government agencies. The company is headquartered in Foster City, California. Learn more at www.gridgain.com and follow GridGain on Twitter @gridgain.

# # #

Note to editors: All company, organization, product or alliance names mentioned herein remain the property of their respective owners.

Come to beautiful Boulder, CO to listen to “In-Memory Big Data Processing on the JVM” presented bo Frontier Developers’ Meetup on Tuesday, June 26.

As always (for the last 6 years with no exception!) – plenty of live coding and building MapReduce apps in less than 5 minutes from scratch!

All details are here.

Meet GridGain team at QCon NYC 2012 June 18-20. Attend our presentation or the booth to talk about In-Memory Compute and Data Grid technologies and see all the new feature of GridGain 4.1.

Hope to see you there!

NYC Scala Meetup

Tuesday, June 5, 2012 6:30 PM

Meetup HQ – 9th Floor
632 Broadway Suite 901, New York, NY

Dmitriy Setrakyan, CTO at GridGain Systems, will present at NY Scala Meetup. Dmitriy Setrakyan will talk about using GridGain for highly distributed HPC programming in Scala.

As one of the main example we will be walking through a realtime word counting program with constantly changing text and will compare it with a Hadoop word counting example. You will see some cool features of GridGain such as Auto-Discovery, Streaming MapReaduce, Zero Deployment, Distributed Data Partitioning, and In-Memory SQL Queries — all coded live in the presentation. Towards the end we will have an extensive Q&A session and cover some interesting real life use cases.

1 2 3 13