After attending the
As with every Spark Summit in the recent years, as well as the huge influx of data science and big data, it is really challenging to pick the right talk to attend. As a speaker and booth representative, unfortunately, I was not able to attend many talks hence my view will be certainly tainted. However, a few things grabbed my attention that are worth sharing.
To my surprise the exhibition floor had a select few vendors highly relevant in the domain. It was easy to navigate
around without being harassed by sales
During the breakout sessions’ breaks the conversations with attendees were highly relevant and interesting, but
people did seem to share a common problem: "How do I speed up my existing Hadoop deployment”, or "How do I make
Spark™ faster". As a GridGain architect and an Apache® Ignite™
During my session I covered at a high level the basics and building blocks of Apache Ignite, specifically around the
in-memory data-grid functionality. I demonstrated how we can store data in memory in a resilient manner and provide
real-time scalability with no sacrifice in uptime. I covered the compute grid only briefly since the audience was
spark-oriented and for our Ignite-Spark
One common challenge I had to address was the necessity for Spark users to find workarounds when they tried to use
Spark as a data source aggregation layer by extracting data from various sources and creating RDDs in Spark memory
It's not really what Spark was designed for; Spark is an in-memory processing platform with no direct storage
abilities.