Bangalore Streams meetup - May 2024 [Kafka Summit Special]

Hello Bengaluru

Platformatory is excited to host a special edition of the Bengaluru Streams meetup on May 4, right after Kafka Summit. Join us for an evening of exciting talks from Warpstream, Redpanda, RisingWave and OneHouse. This is a great opportunity to talk to, meet and learn from some of the newest innovators in the data infra space.

Please note: Since there are limited seats, we can only accommodate people who register . Only people completing the form and receiving a confirmation email will be able to attend.

Schedule

Name	Speaker	Start Time	End Time	Presentation	Recording
Welcome and registration		04:00 PM	04:10 PM
Beyond Tiered Storage: Serverless Kafka with No Local Disks	Richard Artoul, WarpStream	04:20 PM	05:00 PM	Slides	YouTube
Real-Time Predictions with Machine Learning & Redpanda Streaming Data Transforms	Christina Lin, Redpanda	05:00 PM	05:40 PM	Slideshare	YouTube
Networking break with snacks		05:40 PM	06:30 PM
Stream Processing in SQL: One Approach	Noel Kwan, RisingWave	06:30 PM	07:10 PM	Slides	YouTube
Unlocking Seamless Streaming Ingestion with Apache Hudi and Kafka	Sagar Sumit & Vinish Reddy, Onehouse	07:10 PM	07:50 PM	Slides	YouTube

Talks

Beyond Tiered Storage: Serverless Kafka with No Local Disks

Speaker: Richard Artoul, Co-Founder @ WarpStream Labs

About the talk: Separation of compute and storage has become the de-facto standard in the data industry for batch processing. The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world. In this talk, we’ll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks. Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka’s architecture from the ground up, but the benefits are worth it. This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero.

Real-Time Predictions with Machine Learning & Redpanda Streaming Data Transforms

Speaker: Christina Lin, Developer Advocate, Redpanda

About the talk: In this session, we’ll address how to simplify data structures in AI applications, emphasizing the importance of not overcomplicating data architecture while constructing stateless pipelines for real-time analytics.
We’ll cover the creation of an efficient data platform using Redpanda data transforms powered by WebAssembly (WASM), particularly tailored for dynamic industries (we will use food delivery as an example). We’ll show how to simplify your data stack and demonstrate with a lab how complex data structures often hinder the agility and performance of AI systems. The lab will focus on stateless pipelines, where each data item is processed independently, and showcases how to build scalable and robust AI applications without the burden of cumbersome data frameworks. Attendees will see how Redpanda’s integration facilitates seamless real-time data processing and instant transformations that are crucial to responsive and accurate AI-driven predictions.
We will cover:
• Streamlined data ingestion and transformation
• Real-time machine learning
• Simplified infrastructure setup
Participants will learn how to avoid common pitfalls associated with complex data structures and data stacks, and will gain insights into creating more effective, agile, and responsive applications – especially for AI. The methodologies are applicable across various industries and use cases.

Stream Processing in SQL: One Approach

Speaker: Noel Kwan, Software Engineer at RisingWave Labs

About the talk: Join us for an exploration of real-time streaming data processing, where we’ll delve into RisingWave’s Stream Processing Model and interact with data streams using SQL. In this session, we will begin by demonstrating the difference between batch and streaming data processing. We will then cover the internals of RisingWave’s architecture, such as decoupled compute and storage, and discuss how each RisingWave service operates.
We will further dive into stateful and stateless streaming computations, examining aspects like the internal state of stateful computations and how it can be observed from RisingWave. We will also explore RisingWave’s handling of batch queries and discuss the serving scenarios in which RisingWave excels.
Next, we will cover data delivery and ingestion with external systems like Kafka, seamlessly integrating them to showcase how different systems can collaborate to provide various features.
Finally, we will review a simple dashboard application for ride-hailing data and demonstrate these concepts.
After this talk, developers should be more familiar with the batteries-included capabilities of RisingWave and understand how to simplify their stream processing workflows with RisingWave.

Unlocking Seamless Streaming Ingestion with Apache Hudi and Kafka

Speaker: Sagar Sumit, Software Engineer at Onehouse && Vinish Reddy,Data Platform Engineer at Onehouse

About the talk: Join us as we explore Apache Hudi and its transformative utility, Hudi Streamer, for seamless data ingestion from various sources, including Apache Kafka. Learn how Hudi Streamer simplifies workflows with pluggable interfaces for extraction, key generation, and schema provision. We’ll also showcase real-time database replication using CDC, bridging Confluent Cloud platform and Onehouse managed lakehouse. Throughout the session, we’ll highlight the synergy between Hudi and Kafka, empowering organizations to streamline data workflows and drive innovation. Whether you’re a seasoned Kafka user, data engineer, or analytics enthusiast, this session equips you with the tools to harness Hudi’s full potential in your Kafka ecosystem.

Speakers

Richard Artoul
Co-Founder @ WarpStream Labs
Christina Lin
Christina has 20+ years of experience in software development. She has worked as a developer, consultant, architect. She is an advocate for making innovative solutions down to earth and making them easily accessible for everyone. Skilled in Open Source technology such as Redpanda, Apache Camel, Kafka and Kubernetes, Ansible. https://www.linkedin.com/in/christina-lin-14a4b655/
Noel Kwan
Noel works as a Database Kernel Engineer at RisingWave.He focuses on performance and reliability and has worked on fuzzing tools such as SqlSmith and performance-related features such as Backfill. He studied at NUS, with an emphasis on programming languages.
Sagar Sumit
Sagar Sumit is a Software Engineer at Onehouse and an Apache Hudi PMC member. Apart from Hudi, he is also a contributor to the Presto and Trino projects. In the past, he has worked on the team that built Amazon Aurora, a relational database built for the cloud, that now powers mission-critical applications for AWS customers.
Vinish Reddy
Vinish Reddy is a Data Platform Engineer at Onehouse. He works on the Onehouse Data Plane services, and is a PMC member for Apache XTable. He previously worked in the payments team at Uber. He is passionate about solving challenging engineering problems and making an impact.

Bangalore Streams meetup - May 2024 [Kafka Summit Special]

Saturday 04 May 2024 4:00pm - 8:00pm

Venue

Address

Location

Schedule

Talks

Speakers

Platformatory Labs