Meetup Report: Big Data, Washington DC v 2.0

On Sunday 1 April, I flew from London to Washington DC for several days of meetups in and around the DC area. The first of these was held on Monday 2 April at 1776 on the 12th floor of 1133 15th Street NW, Washington DC. 1776 is a startup incubator and was a great local venue not very far from where I was staying in DC.

I was assisted at the meetup by my colleague Chris Cook and attendees started filling the room fairly early. Nearly 140 people registered for the event and we soon had a full house and were ready to begin. Chris took several photos and Figure 1 shows a picture of the audience.

Meetup audience

Figure 1. Meetup audience

The event was organised by Dataconomy and Elena Poughia, the MD of the company, was unable to get to the venue due to bad weather and so Skyped-in and did an introduction and presentation on aspects of Artificial Intelligence. I was up next with a presentation titled "Apache Ignite: The In-Memory Hammer in your Data Science Toolkit."

I began with a short introduction of my background, work experience and the role that GridGain hired me for. Then, I dove straight into the main Ignite presentation.

In my presentation, I began with a high-level overview of Ignite and its architecture, as shown in Figure 2. I described the major components and some of the practical uses of these components.

Apache Ignite

Figure 2. Apache Ignite

Next, I described Ignite’s distributed storage, mentioning its read-through and write-through capabilities with 3rd party storage and its distributed partitioned hash map architecture.

I then covered the durable memory and highlighted Ignite’s off-heap memory management and some of the problems of Java Garbage collection that it solves. I also mentioned the write-ahead log and checkpointing and the ability to quickly restart a cluster in the event of a major failure.

SQL is still extremely important today due to its availability in most major database management systems and BI tools. So, in the next part, I covered Ignite’s distributed SQL, its cross-platform support and support for JDBC and ODBC.

Next, I covered the compute grid and its strengths in supporting data science use cases. I also described peer class loading, a very useful feature that can provide significant developer productivity.

Finally, I focussed on the machine learning grid, describing its architecture and focussing on some of its key strengths, such as large-scale parallelization. I also quickly described some of the key machine learning algorithms provided by Ignite and highlighting the fact that the machine learning libraries provide both local and distributed processing. The distributed processing enabling the algorithms to run on very large data sets and hundreds of nodes. I wrapped-up with a quick demo that launched an Ignite node in an IDE and then ran some example code demonstrating a machine learning algorithm in action.

After the presentation was some Q&A and Chris and I did a prize draw.

Two other presentations followed. The first was by Kirk Borne, a very experienced Data Science professional at Booz Allen Hamilton with an extensive background in working with large data sets. Kirk covered a range of topics in his presentation titled "Advances in Predictive Analytics beyond Time Series Forecasting." The final presentation was by Robert Wensley, CEO of Databook. Robert's presentation was titled "Preventing The Next Cambridge Analytica" and he described a new initiative to give people greater control of their data.

In summary, a great audience, and three presentations covering a wide-range of topics. In discussions with attendees at the end of the meetup, they valued the breadth of the topics presented.