Denis Magda, vice president of the Apache Ignite PMC and director of product management at GridGain Systems, will be one of the featured speakers at the June 26 NYC In-Memory Computing Meetup. Venue: Spaces, 1740 Broadway, NY, NY (15th floor).
In all there will be three talks.
> Talk 1: A “how-to” presentation for building a real-time alerting, analytics and reporting system (at scale). With Denis Magda, vice president of the Apache Ignite PMC and director of product management at GridGain Systems. And Viktor Gamov, developer advocate at Confluent
> Talk 2: Using In-Memory technology for real time analytics. With Andy Rivenes is a Product Manager at Oracle for Database In-Memory.
> Talk 3: Feeding data to the Kubernetes beast: bringing data locality to your containerized big data workloads. With Bin Fan, founding engineer of Alluxio, Inc. and PMC member of Alluxio open source project.
>> Talk 1 (Denis & Viktor): Are you tasked to build a system or upgrade an existing architecture to a solution capable of handling unbound streams of data, do real-time alerting, storing always growing terabytes and petabytes of data and, finally, act on the data within milliseconds SLAs?
By integrating Apache Kafka with Apache Ignite you’ll solve all your requirements faster and easier. A battle-tested recipe is simple -- take Kafka Connect and have your data stream through Kafka pipelines, add a pinch of KSQL to act on the streams with SQL in real-time with zero delays, rinse and flush the pre-processed data in Ignite as in-memory databases and get further insights by analyzing your hot and cold datasets.
Denis and Viktor will demonstrate how to implement the solution in practice. They’ll explain architectural reasoning and the benefits of real-time integration and share common usage patterns.
>> Talk 2 (Andy): Analytical queries typically scan large amounts of data using aggregations to find patterns or trends in the data. In a traditional row-based database this can be slow because each row must be examined to access the columns in a query. Columnar formatted data does not have this problem because just the columns in the query need to be accessed.
In addition, columnar formatted data tends to compress well and work well with vectorized processing like Single Instruction Multiple Data (SIMD). These columnar formatted objects can be queried at orders of magnitude faster speed than the equivalent row format.
Andy will explain how this columnar format provides such a dramatic performance improvement for analytic queries, how it can be used in mixed workload environments and not impact OLTP so that no application changes are required.
>> Talk 3 (Bin): The latest advances in container orchestration by Kubernetes bring cost savings and flexibility to compute workloads in public/hybrid cloud / multi cloud environment. Such architectures typically lead to physically separated compute and storage services, where S3, Azure Data Lake or Google Cloud Storage are commonly used to provide data persistence.
Open source Alluxio approaches this problem in a new way. It helps elastic compute workloads realize the true benefits of the cloud, while bringing data locality and data accessibility to workloads orchestrated by Kubernetes. Alluxio can orchestrate data locality from any persistent storage including object store such as Ceph and cloud storage such as AWS S3 or GCS and make it accessible to compute running in Kubernetes pods. As a stateless data access layer (as opposed to a long-running data storage daemon service), Alluxio runs as a native service making data-intensive compute workloads Kubernetes friendly.
Bin will explain how this new approach of bringing data locality to data-intensive compute workloads in Kubernetes environments. He'll also cover the Alluxio, architecture, deployment with Kubernetes and real-world production use cases.
VP, Developer Relations in R&D at GridGain; Apache Ignite committer and PMC member