Join me April 23 at 1740 Broadway in Manhattan as I host the next NYC In-Memory Computing Meetup! Topics include facial recognition, drug discovery; NoSQL databases and ingesting streaming data! Sponsored by GridGain Systems. Free but you must RSVP!
> George Williams [Director of Data Science and Chief Evangelist, Embedded AI at GSI Technology]
> Dave Rubin [Senior Director for Oracle NoSQL and Embedded Database]
> Pat Patterson [Director of Evangelism at StreamSets]
* 5:45 p.m. -- Pizza, drinks & networking
* 6 p.m. -- Talk 1 (George): "Searching large databases with a billion objects or more!"
* 6:40 p.m. -- Talk 2 (Dave): "Oracle NoSQL Database - The most flexible NoSQL Data Store you’ve NEVER heard of!"
* 7:20 p.m. -- Talk 3 (Pat): "Ingesting Streaming Data for Analysis in Apache Ignite!"
* 7: 50 p.m. -- Closing remarks
* 8:00 p.m. Finis!
>> Talk 1 (George): What do face recognition, visual e-commerce, and drug discovery have in common? These days they involve searching large databases with a billion objects or more. Unlike traditional database search, exact match to a query is not the goal. Instead similarity search is employed to retrieve the closest items. One way to scale similarity search is to use specialized hardware and I’ll talk about GSI Technology’s custom chip that accelerates large-scale similarity search.
GSI Technology designs, develops and markets a broad range of high performance memory products for networking, military, medical, automotive and other applications.
>> Talk 2 (Dave): Oracle, NoSQL, really? You’re all aware of the Oracle RDBMS, but did you know that Oracle has been working on a distributed, shared nothing, horizontally scalable NoSQL Database for ten years? The Oracle NoSQL Database has been in production since 2011 solving large scale, low latency, distributed database problems. Come hear about how flexible the product is, our experience in developing it, internal design tradeoffs, and explore some code with us.
>> Talk 3 (Pat): Apache Ignite provides a distributed platform for a wide variety of workloads, but often the issue is simply in getting data into the database in the first place. The wide variety of data sources and formats presents a challenge to any data engineer; in addition, 'data drift', the constant and inevitable mutation of the incoming data's structure and semantics, can break even the most well-engineered integration.
This session, aimed at data architects, data engineers and developers, will explore how we can use the open source StreamSets Data Collector to build robust data pipelines. Attendees will learn how to collect data from cloud platforms such as Amazon and Salesforce, devices, relational databases and other sources, continuously stream it to Ignite, and then use features such as Ignite's continuous queries to perform streaming analysis.
We'll start by covering the basics of reading files from disk, move on to relational databases, then look at more challenging sources such as APIs and message queues. You will learn how to:
* Build data pipelines to ingest a wide variety of data into Apache Ignite
* Anticipate and manage data drift to ensure that data keeps flowing
* Perform simple and complex ad-hoc queries in Ignite via SQL
* Write applications using Ignite to run continuous queries, combining data from multiple sources
See you April 23! Please RSVP because space will be limited!