Meetup Report: Apache Spark Maryland 2018 KICKOFF!

The fourth and final meetup that I attended on my trip to Washington DC was held on Thursday 5 April at The Hotel at Arundel Preserve, 7795 Arundel Mills Boulevard, Hanover, Maryland. The meetup was organized by Apache Spark and Distributed Computing Maryland.

The location of the meetup was quite far from where I was staying, so I had to journey out of Washington using on-demand car transportation. It was, by far, the longest journey for any of the meetups on this trip to Washington. I left early as I was informed that rush-hour started early in and around Washington. Taking this advice, I arrived with plenty of time to spare.

The venue was very nice, and the meetup took place in one of the meeting rooms on the second floor of the hotel. My colleague Ted Carroll assisted me this time and took some pictures. Figure 1 shows the attendees after the meetup had started.

Figure 1. Meetup audience

Figure 1. Meetup audience

First up was Dr Ameet Kini, a Resident Solutions Architect at Databricks. Ameet’s presentation covered new and upcoming features in Apache Spark 2.3 and Databricks Delta. He showed some streaming demos using notebooks. There was good audience interaction and he did a nice job with his talk and wrapped-up exactly on time.

I was up next, and I began the meetup with a brief introduction of my background, work experience and the role that GridGain hired me for. I then dove into the main presentation titled "Apache Ignite and Apache Spark: Where Fast Data Meets the IoT."

I began with an introduction of IoT demands to software, such as real-time processing, high availability, simple scalability, and so on. This helped establish the context for the rest of the presentation.

Next, I presented an IoT software stack. Then I suggested suitable technologies that could be used in this stack, as shown in Figure 2.

Figure 2. Apache IoT Software Stack

Figure 2. Apache IoT Software Stack

Diving into more detail, I then discussed the suitable technologies in turn. For the NewSQL database, I focussed on Apache Ignite and spent a little time describing its various components and some of the practical uses of these components.

In succession, I then discussed Ignite’s data grid, distributed SQL and compute grid. I then focussed on Ignite’s integration with Spark, as shown in Figure 3.

Figure 3. Ignite and Spark Integration

Figure 3. Ignite and Spark Integration

Finally, I focussed on the machine learning grid, describing its architecture and some of its key strengths, such as large-scale parallelization and no ETL. I also quickly mentioned some of the key machine learning algorithms provided by Ignite and highlighting the fact that the machine learning libraries provide both local and distributed processing. The distributed processing enabling the algorithms to run on very large data sets and many hundreds of nodes.

I wrapped-up my presentation by showing several demos. The first one simulated an IoT application with streaming data using Spark and Ignite. Output was delivered to the IDE console. The goal of this demo was to show the ease of integration between Ingite and Spark and the ability to process streaming data quickly. The second demo showed a shared data structure that Spark viewed as an RDD and Ignite as a cache. Running several programs from the command line and from an IDE, I showed the ease of sharing state between applications, even after Spark had been shut-down. I also quickly demoed the Ignite Web Console, which provides information about Ignite clusters.

The audience was very attentive, and some great questions were asked after I had finished my presentation. For example, there was strong interest in the Shared RDD, as well as the machine learning capabilities. Ted and I also did several prize draws.

In summary, a great audience and great off-line interaction. This is also the end of this tour to Washington. I'd like to thank all the meetup organizers on this tour, my colleague Tom Diederich for helping to make these meetups happen and to my colleagues Chris Cook and Ted Carroll for their support at the actual meetup events. Time to go home now.

If you are interested in having me speak at a meetup or conference, please see Tom's blog post and reach out to him.

Until next time.