NYC In-Memory Computing Meetup
Denis
Magda, vice president of the Apache Ignite PMC and director of product management at GridGain Systems, will be one
of the featured speakers at the June 26 NYC In-Memory Computing Meetup. Venue: Spaces, 1740 Broadway, NY, NY (15th
floor).
In all there will be three talks.
> Talk 1: A “how-to” presentation for building a
real-time alerting, analytics and reporting system (at scale). With Denis Magda, vice president of the Apache Ignite
PMC and director of product management at GridGain Systems. And Viktor Gamov, developer advocate at Confluent
> Talk 2: Using In-Memory technology for real time analytics. With Andy Rivenes is a Product Manager at
Oracle for Database In-Memory.
> Talk 3: Feeding data to the Kubernetes beast: bringing data locality
to your containerized big data workloads. With Bin Fan, founding engineer of Alluxio, Inc. and PMC member of Alluxio
open source project.
Talk details
>> Talk 1 (Denis & Viktor): Are you tasked to build a system
or upgrade an existing architecture to a solution capable of handling unbound streams of data, do real-time
alerting, storing always growing terabytes and petabytes of data and, finally, act on the data within milliseconds
SLAs?
By integrating Apache Kafka with Apache Ignite you’ll solve all your requirements faster and
easier. A battle-tested recipe is simple -- take Kafka Connect and have your data stream through Kafka pipelines,
add a pinch of KSQL to act on the streams with SQL in real-time with zero delays, rinse and flush the pre-processed
data in Ignite as in-memory databases and get further insights by analyzing your hot and cold datasets.
Denis and Viktor will demonstrate how to implement the solution in practice. They’ll explain architectural reasoning
and the benefits of real-time integration and share common usage patterns.
>> Talk 2 (Andy):
Analytical queries typically scan large amounts of data using aggregations to find patterns or trends in the data.
In a traditional row-based database this can be slow because each row must be examined to access the columns in a
query. Columnar formatted data does not have this problem because just the columns in the query need to be accessed.
In addition, columnar formatted data tends to compress well and work well with vectorized processing like
Single Instruction Multiple Data (SIMD). These columnar formatted objects can be queried at orders of magnitude
faster speed than the equivalent row format.
Andy will explain how this columnar format provides such a
dramatic performance improvement for analytic queries, how it can be used in mixed workload environments and not
impact OLTP so that no application changes are required.
>> Talk 3 (Bin): The latest advances in
container orchestration by Kubernetes bring cost savings and flexibility to compute workloads in public/hybrid cloud
/ multi cloud environment. Such architectures typically lead to physically separated compute and storage services,
where S3, Azure Data Lake or Google Cloud Storage are commonly used to provide data persistence.
Open
source Alluxio approaches this problem in a new way. It helps elastic compute workloads realize the true benefits of
the cloud, while bringing data locality and data accessibility to workloads orchestrated by Kubernetes. Alluxio can
orchestrate data locality from any persistent storage including object store such as Ceph and cloud storage such as
AWS S3 or GCS and make it accessible to compute running in Kubernetes pods. As a stateless data access layer (as
opposed to a long-running data storage daemon service), Alluxio runs as a native service making data-intensive
compute workloads Kubernetes friendly.
Bin will explain how this new approach of bringing data locality
to data-intensive compute workloads in Kubernetes environments. He'll also cover the Alluxio, architecture,
deployment with Kubernetes and real-world production use cases.
Former VP, Developer Relations in R&D at GridGain; Apache Ignite committer and PMC member