Name | Speaker | Start Time | End Time | Presentation | Recording |
---|---|---|---|---|---|
Welcome and registration | 10:00 AM | 10:30 AM | |||
Keynote | CRED | 10:30 AM | 10:45 AM | ||
Apache Pulsar - The anatomy of Pub & Sub | Shivji Kumar Jha, Nutanix | 10:45 AM | 11:15 AM | Slideshare | YouTube |
Empowering Interactive Queries on Lakehouse: Integrating Apache Hudi with Presto & Trino | Sagar Sumit, Onehouse | 11:25 AM | 12:00 PM | Slides | YouTube |
Networking break | 12:00 PM | 12:20 PM | |||
Stream joins using Aerospike and RTP (Realtime Processing Platform) | Vikas Sharma, CRED | 12:20 PM | 1:00 PM | Slides | YouTube |
A primer on streaming databases: next gen tooling for streaming and stream processing | Pavan Keshavamurthy & Avinash Upadhyaya, Platformatory | 01:00 PM | 01:40 PM | Slides | YouTube |
Lunch and Networking | 01:45 PM | 2:30 PM |
Speaker: Shivji Kumar Jha - Staff Engineer at Nutanix
About the talk: This is a deep tech presentation on what Apache Pulsar ( a great choice for streaming and messaging) does internally to store a message and give it back in single digit milliseconds at high throughput. We will look at the internal data structures, APIs, read and write paths that make a streaming engine efficient given the expectations and constraints. This talk will give you a great perspective on what and how of designing your apps right to make the best use of Apache Pulsar (or any other streaming framework), We push 100 MB messages per second at peak and it is one boring technology that just works for us with close to zero maintenance.
Speaker: Sagar Sumit, Apache Committer and Software Engineer at Onehouse
About the talk: Apache Hudi is revolutionizing big data analytics by optimizing lakehouse architecture for efficient upserts, deletes, and incremental processes. As query engines like Presto and Trino become increasingly prominent in data ecosystems, the need for swift querying and on-the-fly data exploration becomes paramount. This talk dives deep into the harmonization between Hudi and these engines, showcasing how their integration yields unparalleled query performance on petabyte-scale datasets, all while ensuring data freshness. Attendees will gather insights into the architectural intricacies, the challenges of integration, and the manifold benefits stemming from this collaboration.
Speaker: Rahul Gulati - Principal Engineer at Saxo Bank
About the talk: Designing a Data Mesh with Kafka “Data Mesh objective is to create a foundation for getting value from analytical data and historical facts at scale” [Dehghani, Data Mesh founder] If the central concern of a Data Mesh is about enabling analytics, then how is Kafka relevant? In this talk we will describe how we managed to apply Data Mesh founding principles to our operational plane, based on Kafka. Consequently, we have gained value from these principles more broadly than just analytics. An example of this is treating data as-a-product, i.e. that data is discoverable, addressable, trustworthy and self-describing. We will then describe our implementation, which includes deep dives into Cluster Linking, Connectors and SMTs. Finally, we discuss the dramatic simplification of our analytical plane and consuming personas. Agenda
• Saxo Bank’s implementation of the Data Mesh
• Cluster Linking - Why? How?
• Data lake connectors – configuration and auto-deployment strategy
• Mapping Kafka to the data lake infrastructure.
• Enabling analytics, data warehousing and production support
Speaker: Vikas Sharma, Principal Engineer at CRED
About the talk: Stream joins are integral to real-time data processing, enabling the fusion of information from different data streams for more insightful analytics and product decisioning. Leveraging Aerospike, a high-performance NoSQL database, in conjunction with a Realtime Processing Platform (RTP), we will explore how Aerospike’s in-memory and disk hybrid storage architecture and RTP’s real-time computation abilities can work together to perform efficient stream joins. It will delve into various techniques, such as parallel stream processing, stream state management, sharding, async I/O and optimisation of stream join algorithms for eventual consistency.
Speaker: Pavan Keshavamurthy, CTO & Co-Founder @ Platformatory.io & Avinash Upadhyaya, Platform Engineer @ Platformatory.io
About the talk: In this talk, we will explore why streaming databases are the rage and what makes them the hottest thing in the world of streaming.
In particular, we will cover how they converge into and diverge from the world views of both traditional OLAP and stream processing.
As a final part of this talk, we will also deep dive into important characteristics of a few streaming DBs with a comparative lens.
Time permitting, a small demo of capabilities will also follow.