Name | Speaker | Start Time | End Time | Presentation | Recording |
---|---|---|---|---|---|
Welcome and registration | 04:00 PM | 04:10 PM | |||
Beyond Tiered Storage: Serverless Kafka with No Local Disks | Richard Artoul, WarpStream | 04:20 PM | 05:00 PM | Slides | YouTube |
Real-Time Predictions with Machine Learning & Redpanda Streaming Data Transforms | Christina Lin, Redpanda | 05:00 PM | 05:40 PM | Slideshare | YouTube |
Networking break with snacks | 05:40 PM | 06:30 PM | |||
Stream Processing in SQL: One Approach | Noel Kwan, RisingWave | 06:30 PM | 07:10 PM | Slides | YouTube |
Unlocking Seamless Streaming Ingestion with Apache Hudi and Kafka | Sagar Sumit & Vinish Reddy, Onehouse | 07:10 PM | 07:50 PM | Slides | YouTube |
Speaker: Richard Artoul, Co-Founder @ WarpStream Labs
About the talk: Separation of compute and storage has become the de-facto standard in the data industry for batch processing. The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world. In this talk, we’ll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks. Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka’s architecture from the ground up, but the benefits are worth it. This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero.
Speaker: Christina Lin, Developer Advocate, Redpanda
About the talk: In this session, we’ll address how to simplify data structures in AI applications, emphasizing the importance of not overcomplicating data architecture while constructing stateless pipelines for real-time analytics.
We’ll cover the creation of an efficient data platform using Redpanda data transforms powered by WebAssembly (WASM), particularly tailored for dynamic industries (we will use food delivery as an example). We’ll show how to simplify your data stack and demonstrate with a lab how complex data structures often hinder the agility and performance of AI systems. The lab will focus on stateless pipelines, where each data item is processed independently, and showcases how to build scalable and robust AI applications without the burden of cumbersome data frameworks. Attendees will see how Redpanda’s integration facilitates seamless real-time data processing and instant transformations that are crucial to responsive and accurate AI-driven predictions.
We will cover:
• Streamlined data ingestion and transformation
• Real-time machine learning
• Simplified infrastructure setup
Participants will learn how to avoid common pitfalls associated with complex data structures and data stacks, and will gain insights into creating more effective, agile, and responsive applications – especially for AI. The methodologies are applicable across various industries and use cases.
Speaker: Noel Kwan, Software Engineer at RisingWave Labs
About the talk: Join us for an exploration of real-time streaming data processing, where we’ll delve into RisingWave’s Stream Processing Model and interact with data streams using SQL.
In this session, we will begin by demonstrating the difference between batch and streaming data processing. We will then cover the internals of RisingWave’s architecture, such as decoupled compute and storage, and discuss how each RisingWave service operates.
We will further dive into stateful and stateless streaming computations, examining aspects like the internal state of stateful computations and how it can be observed from RisingWave.
We will also explore RisingWave’s handling of batch queries and discuss the serving scenarios in which RisingWave excels.
Next, we will cover data delivery and ingestion with external systems like Kafka, seamlessly integrating them to showcase how different systems can collaborate to provide various features.
Finally, we will review a simple dashboard application for ride-hailing data and demonstrate these concepts.
After this talk, developers should be more familiar with the batteries-included capabilities of RisingWave and understand how to simplify their stream processing workflows with RisingWave.
Speaker: Sagar Sumit, Software Engineer at Onehouse && Vinish Reddy,Data Platform Engineer at Onehouse
About the talk: Join us as we explore Apache Hudi and its transformative utility, Hudi Streamer, for seamless data ingestion from various sources, including Apache Kafka. Learn how Hudi Streamer simplifies workflows with pluggable interfaces for extraction, key generation, and schema provision. We’ll also showcase real-time database replication using CDC, bridging Confluent Cloud platform and Onehouse managed lakehouse. Throughout the session, we’ll highlight the synergy between Hudi and Kafka, empowering organizations to streamline data workflows and drive innovation. Whether you’re a seasoned Kafka user, data engineer, or analytics enthusiast, this session equips you with the tools to harness Hudi’s full potential in your Kafka ecosystem.