Affluence, No. 72/1, St. Mark's Road, Bangalore 560001
Name | Speaker | Start Time | End Time | Presentation | Recording |
---|---|---|---|---|---|
Welcome and registration | 10:00 AM | 10:20 AM | |||
Real time Analytics with Apache Kafka and Apache Druid | Tijo Thomas, Imply | 10:30 AM | 11:15 AM | YouTube | |
Real time data lake for CDC with Apache Paimon and Flink | Avinash Upadhyaya, Platformatory | 11:20 AM | 12:00 PM | Slides | YouTube |
Networking break | 12:00 PM | 12:15 PM | |||
Time Series Text Indexing in Apache Pinot | Atri Sharma, Atlassian | 12:15 PM | 01:00 PM | YouTube | |
Lunch and Networking | 01:00 PM | 02:30 PM |
Speaker: Tijo Thomas, Manager, Solutions Architect @ Imply
About the talk: In the digital era, the ability to process and analyze data in real-time is becoming increasingly vital for organizations aiming to stay competitive and make informed decisions swiftly. Kafka, a distributed event streaming platform, has emerged as a powerful tool for businesses looking to capitalize on real-time data analytics. This presentation will guide you through the essentials of generating and managing large volumes of events with Kafka, and how to leverage these capabilities for real-time analytics using Druid.
We will explore the architectural underpinnings druid internal that facilitate the construction of robust real-time analytics applications. The discussion will extend to practical strategies for navigating the complexities of data streaming, focusing on how to effectively utilize Kafka alongside other cutting-edge tools to build scalable, efficient, and high-performing real-time analytics solutions.
Speaker: Avinash Upadhyaya, Platform engineer at Platformatory
About the talk: Apache Paimon is a streaming data lake platform with high-speed data ingestion, changelog tracking and efficient real-time analytics. In this discussion, we aim to explain how Paimon addresses the issue of bringing in change data (CDC) into the data lake. This includes the process from CDC ingestion, updating parts of the data, and reading the change log in a stream. Paimon simplifies CDC data and closely integrates with Flink. Dealing with CDC in the Data Lake poses challenges such as syncing CDC data with schema changes, using a partial-update merge engine, and tracking changes in the data stream. To sum up, we’ll provide an overview and discuss what’s on the horizon for Paimon.
Speaker: Atri Sharma, Senior Principal Engineer at Atlassian
About the talk: Time series engines such as Apache Pinot are great at streaming aggregations - but text search is a different beast. This talk will focus on native text index and engine built for Apache Pinot which works well for time series engines and maintains the invariant that latest data is most valuable