GridGain will be presenting at JAX London 2012 conference in London, October 16th, 2012. We’ll be talking about “Improving MapReduce: GridGain & Scala To The Rescue!”. If you are interesting in pretty cool live Scala coding of Streaming MapReduce application – come to our talk and watch it live :)
Hope to see you there! All information about the conference is here.
GridGain 4.3 incorporates many optimizations that significantly improve GridGain’s performance and memory footprint based directly on feedback from our largest deployments and is 100% backwards compatible with prior 4.x versions.
The new features and performance numbers speak for themselves:
- Up to 20x better serialization performance compared to standard Java.
- 3x better performance for collocating computations and data.
- 3x better query indexing from an improved cache locking implementation.
- 50% reduction in cache overhead and significantly reduced garbage collection overhead.
- Dramatically improved BigMemory support provides high-performance off-heap data store that eliminates lengthy garbage collection pauses when working with hundreds of gigabytes of memory.
- New Data-Affinity-Aware Router supports client communication through the firewall.
- Significant performance enhancements for Java, C++, and .NET clients.
We had this discussion in office another day and I couldn’t figure our why popular NoSQL products like MongoDB, CouchDB, Cassandra, etc. are generally years behind on some of the core scalability and performance technologies behind established In-Memory Data Grid vendors like Coherence, GridGain, GigaSpaces, GemFire?
It’s not a feature by feature comparison that I’m talking about – all of these products has plenty of unique features. It’s just the fact that while IMDG were scaling to 100s and 1000s of nodes and working in mission critical systems for years now – NoSQL products are simply not there and moving there rather slowly (if at all).
I can’t remember ever having IMDG product with a global-lock based implementation (like the MongoDB “engineered”), as an illustration to what I’m talking about it.
Than it downed on me while here at HPC for Wall Street conference (where not a single NoSQL vendor was present, btw): it’s customers…
Look, 90% of NoSQL usage comes from the same crowd as a typical memcached users: non-critical, “moms-n-pops” websites. 90% of IMDG/IMCG usage comes from mission critical systems.
Different customers => different requirements => different products…
GridGain is a gold sponsor and presenter at 2012 HPC for Wall Street conference, September 19th, NYC. Come to say hi at our booth and our talk about “In-Memory Big Data Processing”. Hope to see you some of you there.
All information and registration details are available here.
Two conferences the same week, same dates! Apart from being in Boston for “Big Data Innovation”, we’ll be presenting and sponsoring new NYC-based conference called “DataGotham” in NYC on Sep 13 & 14. Very exciting about this conference and we have great company there.
Hope to see you there as well – stop by our presentation and talk to us!
Registration and general conference information is available here.
GridGain System is sponsoring “Big Data Innovation” that will be held in Boston this week September 13th and 14th. Pretty exciting conference with great companies participating.
Hope to see you there – stop by our booth and talk to us! We have some pretty cool demos to show and very exciting news coming up!
Registration and general conference information is available here.
Boulder, CO, August 22, 2012 – Precog, the market-leading provider of analytics as a service, today announced that it has partnered with GridGain Systems, a real time big data software company, to offer a real-time data analysis solution. Labcoat by GridGain is a breakthrough analytics platform created from the fusion of two best-in-breed technologies: the data analysis technology of Precog, and the in-memory big data technology of GridGain.
With an interactive environment for data analysis, Labcoat empowers data analysts, data scientists, and business users to integrate, analyze, and visualize massive volumes of structured or semi-structured data. Powered by GridGain’s highly-scalable, in-memory data platform, Labcoat helps businesses answer their deepest questions with real-time performance and Enterprise-class scalability.
“GridGain has proven itself the platform of choice for real-time big data applications. More Fortune 500 businesses rely on GridGain’s in-memory big data platform than any other vendor,” said Nikita Ivanov, Founder & CEO of GridGain. “Now with Labcoat, GridGain has the unprecedented ability to perform sophisticated analytics with a simple user-interface and no coding.”
The Labcoat environment with the Quirrel evaluation and execution engine is sourced from Precog, while the in-memory scalable data grid is sourced from GridGain. The fusion of the two technologies is seamless and the entire package is available as a single download (no installation is required for Labcoat, as the software is web-based).
“For decades, companies have used legacy SQL systems to analyze data. But legacy systems can’t keep up with today’s exploding volumes of data, and they can only answer simple questions about data.” says John A. De Goes, CEO of Precog. “With Labcoat by GridGain, companies can now mine any amount of data to find answers to the deepest of questions, with blisteringly fast performance. It’s a revolution in data science.”
To learn more about Labcoat by GridGain, please visit the Precog website at http://www.precog.com and contact us to receive more information. To experiment with the Labcoat application, please go to labcoat.precog.com
Precog is the leading provider of analytics as a service, helping developers incorporate intelligence and insights into their applications. Precog employs the original authors of the Quirrel language, and has developed the industry’s most complete and heavily-optimized execution stack for the Quirrel language. In addition, Precog develops Labcoat, a highly-interactive, collaborative data analysis environment. For more information, please visit http://www.precog.com and follow Precog on Twitter @precogio.
GridGain develops scalable, distributed, in-memory data platform technology for real time data processing. The company’s Java-based middleware products enable development of applications and services that can instantly access terabytes to petabytes of information from any data source or file system, distribute computational tasks across any number of machines, and produce results orders of magnitude faster than traditionally architected systems. GridGain’s customers include innovative web and mobile businesses, leading Fortune 500 companies, and top government agencies. The company is headquartered in Foster City, California. Learn more at http://www.gridgain.com.
Tasha Danko, 720-936-1131
GridGain Systems, Inc.
Jon Webster, 650-241-2281, ext. 2011
Excellent paper released by researchers at University of California, Berkeley . They have analyzed data from Hadoop installation at Facebook (one of the largest as such in the world) looking at various metrics for Hadoop jobs running at Facebook datacenter that has over 3,000 computers dedicated to Hadoop-based processing.
They have come up with very interesting insights. I advise everyone read it firsthand but I will list some of the interesting bits.
Traditional quest for disk locality (a.k.a. affinity between the Hadoop task and the disk that contains the input data for that task) was based on two key assumptions:
- Local disk access is significantly faster than network access to remote disk
- Hadoop tasks spend significant amount of their processing time in disk IO reading input data
Through careful analysis of Hadoop system at Facebook (as their prime testbed) authors claim that both of these assumptions are rapidly loosing hold:
- With new full-bisection topologies in the modern data centers the local disk access is almost identical in performance to a network access even across the racks (with performance difference today between two is less than 10%).
- Greater parallelization and data compressions leads to lower disk IO demand on the individual tasks; in fact, Hadoop job at Facebook deal mostly with text-baed data that can be compressed dramatically.
Authors then argue that memory locality (i.e. keeping input data in memory and maintaining affinity between Hadoop task and its in-memory input data) produces much greater performance advantages because:
- RAM access is up to three orders of magnitude faster than a local disk access
- Even though memory size is significantly less than disk capacity it is large enough for most cases (see below)
Consider this fact: despite the fact that 75% of all HDFS blocks are accessed only once the 64% of Hadoop jobs at Facebook achieve the full memory locality for all their tasks (!). In case of Hadoop – full locality means that there is no outlier task that will have to access disk and delay the entire job. And this is all achieved utilizing rather primitive LFU caching policy and basic pre-fetching for input data.
With these facts authors conclude that disk locality is no longer worth while to vie for – and in-memory co-location is the way forward for high performance big data processing as it yields far greater returns.
Facebook’s case is a solid proof of this technology, and GridGain’s In-Memory Data Platform is a solid platform for the rest of us.
NEW YORK – August 15, 2012 – DataArt, a custom software development firm that builds advanced solutions for select industries, today announced a strategic partnership with GridGain, a real time Big Data software company. The collaboration will give DataArt clients the expert real time business data processing, data mining and MapReduce processing technology needed to address mounting data volumes.
Enterprises continue to seek new ways to effectively manage real time analytics, reporting and processing applications which uncover hidden correlations, earmark industry insights and drive competitive advantage. The collaboration with GridGain allows DataArt to provide its enterprise clients in financial services, travel and hospitality and green energy sectors with a combination of data analysis disciplines to recognize and respond to changing market dynamics. DataArt clients can tap into GridGain’s flexible JVM-based in-memory data platform for high performance computing to support both current and future real time data sets.
“We are very pleased to be partnering with DataArt,” said Jon Webster, Vice President of Business Development at GridGain. “DataArt’s domain expertise in building custom software solutions across capital markets, travel and hospitality, healthcare and other industries complements our products which are geared towards processing large volumes of high velocity data at very low latencies.”
The partnership opens new horizons for companies seeking cost effective, short time to market agile custom software solutions. Clients across industries can customize analysis for risk modeling, recommendation engines, ad targeting customer behavior, trade surveillance, point of sale, real-time sentiment and business intelligence. With GridGain’s unique in-memory data platform financial services offerings, implementation of pre-and post- trade compliance, trade capturing, scalable risk computation, and middle-tier data access is drastically simplified.
“Our clients constantly face exploding volumes of financial and business data but are hampered by data unstructured repositories and real-time computation scalability,” said Oleg Komissarov, SVP of Enterprise Solutions at DataArt. “GridGain’s sheer speed and ease of implementation means our clients can now prioritize and act on crucial market opportunities. Together we will work to define foundational necessities and architectures to sustain long-term Big Data solutions.”
DataArt (www.dataart.com) is a custom software development firm that builds advanced solutions for the financial services, healthcare, hospitality and other industries. Combining domain knowledge with offshore cost advantages and resource flexibility, DataArt develops industry-defining applications, helping clients optimize time-to-market and minimize software development risks in mission-critical systems. With an unrivaled talent pool of highly skilled software engineers in New York, London, Russia and Ukraine, DataArt provides the technical skill, accountability and industry knowledge needed to deliver custom applications on time and on budget.
DataArt clients include Standard & Poor’s, Harmonic Fund Services, Ogilvy, artnet, Panasonic, Cancer Research, Ocado, Charles River Laboratories, Betfair, Misys, leading asset management firms and three of the world’s top ten investment banks. http://www.twitter.com/DataArt
GridGain develops scalable, distributed, in-memory data platform technology for real time data processing. The company’s Java-based middleware products enable development of applications and services that can instantly access terabytes to petabytes of information from any data source or file system, distribute computational tasks across any number of machines, and produce results orders of magnitude faster than traditionally architected systems. GridGain’s customers include innovative web and mobile businesses, leading Fortune 500 companies, and top government agencies. The company is headquartered in Foster City, California. Learn more at www.gridgain.com and follow GridGain on Twitter @gridgain.
+1 (212) 255.0080, ext. 11
GridGain Systems, Inc.
+1 (650) 241-2281, ext. 2011
Live coding, good introduction into in-memory computing and data processing and plenty of… Scala. Come and stop by for good pizza too! All information is on JUG’s website.