Meetup Report: Apache Ignite: the in-memory hammer in your data science toolkit

The third meetup that I attended on my trip to Washington, D.C. was held on Wednesday 4 April at REI Systems, 45335 Vintage Park Plaza, Sterling, Virginia. The meetup was organized by Nova Data Science.

The location of the meetup was quite far from where my colleague Chris Cook and I were staying, so we had to journey out of Washington taking the Metro and then on-demand car transportation. Jokingly, Chris and I agreed that the journey felt rather like the movie "Planes, Trains and Automobiles."

Chris and I arrived early at the venue, but attendees also began arriving quite soon. Being unfamiliar with the AV system, we were kindly assisted by attendees who eventually managed to rig a good temporary solution to allow my laptop to project the slides onto a screen. I could see some humor here and was reminded of the joke about how many engineers were required to change a light bulb. However, a great job by all to ensure that the presentation could be delivered.

Once we were ready to start, there was a short introduction and meetup business presented by Dr. Pragyansmita Nayak. She was one of the organizers of the meetup group.

Chris took a photo and Figure 1 shows some of the audience.

Figure 1. Meetup audience

Figure 1. Meetup audience

I began the meetup with a brief introduction of my background, work experience and the role that GridGain hired me for. I then dove into the main presentation titled "Apache Ignite: The In-Memory Hammer in your Data Science Toolkit."

I began with a high-level overview of Ignite and its architecture. I then described the major components and some of the practical uses of these components.

I then described Ignite’s distributed storage, including its read-through and write-through capabilities with 3rd party storage, and its distributed partitioned hash map architecture.

Next, I covered the durable memory and also mentioned the write-ahead log and checkpointing and the ability to quickly restart a cluster in the event of a major failure.

Then I covered the importance of SQL today and, in the next part, Ignite’s distributed SQL, its cross-platform support and support for JDBC and ODBC.

Next, I covered the compute grid and its strengths in supporting data science use cases. I also described peer class loading, a very useful feature that can provide significant developer productivity.

Finally, I focussed on the machine learning grid, describing its architecture and some of its key strengths, such as large-scale parallelization and no ETL, as highlighted in Figure 2.

Figure 2. Machine Learning Grid

Figure 2. Machine Learning Grid

I also quickly described some of the key machine learning algorithms provided by Ignite and highlighting the fact that the machine learning libraries provide both local and distributed processing. The distributed processing enabling the algorithms to run on very large data sets and many hundreds of nodes. I wrapped-up with a quick demo that launched two Ignite nodes in an IDE and then ran some example code demonstrating a machine learning algorithm in action.

The audience was very attentive, and some great questions were asked at the end of the presentation, such as ability to handle cluster failures, use cases in healthcare, IoT applications, and managing heterogeneous data sources, such as Relational DBMSs and NoSQL DBMSs. Chris and I also did a prize draw.

In summary, great audience and great interaction. Plenty of Pizza also available at the meetup, sponsored by GridGain.