Bangalore Streams meetup - August 2023

Saturday 26 Aug 2023 9:30am - 2:30pm

Discover Our Venue

Find Us Here: Our Event Address

769 and 770, 100 Feet Road, 12th Main, HAL 2nd Stage, Indiranagar, Bengaluru 560038

Where We Are: Navigate to Our Event Hub

CRED Aero, Bengaluru

Map

On-Demand Talks: Access Our Recorded Talks

Shivji Kumar Jha, Nutanix

Apache Pulsar - The anatomy of Pub & Sub

Download Slides

Sagar Sumit, Onehouse

Empowering Interactive Queries on Lakehouse: Integrating Apache Hudi with Presto & Trino | Sagar Sumit, Onehouse

Download Slides

Vikas Sharma, CRED

Stream joins using Aerospike and RTP (Realtime Processing Platform)

Download Slides

Pavan Keshavamurthy & Avinash Upadhyaya, Platformatory

A primer on streaming databases: next gen tooling for streaming and stream processing

Download Slides

Hello Bengaluru

We're excited to bring to you a new series of events in Bengaluru focused on data streaming and adjacent technologies. Our objective is to share knowledge and provide a platform for thought leadership around:

Event Streaming Technologies (Apache Kafka and more)
Event Driven Architecture
Stream Processing
Streaming Databases
Real-time analytics
Data Mesh
..and more.

We're hosting our next in-person event on August 26. Join us for exciting discussions in the streaming world with opportunities to network with peers and leaders in the industry.

Schedule

Name	Speaker	Start Time	End Time	Presentation	Recording
Welcome and registration		10:00 AM	10:30 AM
Keynote	CRED	10:30 AM	10:45 AM
Apache Pulsar - The anatomy of Pub & Sub	Shivji Kumar Jha, Nutanix	10:45 AM	11:15 AM	Slideshare	YouTube
Empowering Interactive Queries on Lakehouse: Integrating Apache Hudi with Presto & Trino	Sagar Sumit, Onehouse	11:25 AM	12:00 PM	Slides	YouTube
Networking break		12:00 PM	12:20 PM
Stream joins using Aerospike and RTP (Realtime Processing Platform)	Vikas Sharma, CRED	12:20 PM	1:00 PM	Slides	YouTube
A primer on streaming databases: next gen tooling for streaming and stream processing	Pavan Keshavamurthy & Avinash Upadhyaya, Platformatory	01:00 PM	01:40 PM	Slides	YouTube
Lunch and Networking		01:45 PM	2:30 PM

Talks

Apace Pulsar - The anatomy of Pub & Sub

Speaker: Shivji Kumar Jha - Staff Engineer at Nutanix

About the talk: This is a deep tech presentation on what Apache Pulsar ( a great choice for streaming and messaging) does internally to store a message and give it back in single digit milliseconds at high throughput. We will look at the internal data structures, APIs, read and write paths that make a streaming engine efficient given the expectations and constraints. This talk will give you a great perspective on what and how of designing your apps right to make the best use of Apache Pulsar (or any other streaming framework), We push 100 MB messages per second at peak and it is one boring technology that just works for us with close to zero maintenance.

Empowering Interactive Queries on Lakehouse: Integrating Apache Hudi with Presto & Trino

Speaker: Sagar Sumit, Apache Committer and Software Engineer at Onehouse

About the talk: Apache Hudi is revolutionizing big data analytics by optimizing lakehouse architecture for efficient upserts, deletes, and incremental processes. As query engines like Presto and Trino become increasingly prominent in data ecosystems, the need for swift querying and on-the-fly data exploration becomes paramount. This talk dives deep into the harmonization between Hudi and these engines, showcasing how their integration yields unparalleled query performance on petabyte-scale datasets, all while ensuring data freshness. Attendees will gather insights into the architectural intricacies, the challenges of integration, and the manifold benefits stemming from this collaboration.

Designing a Data Mesh with Kafka

Speaker: Rahul Gulati - Principal Engineer at Saxo Bank

About the talk: Designing a Data Mesh with Kafka “Data Mesh objective is to create a foundation for getting value from analytical data and historical facts at scale” [Dehghani, Data Mesh founder] If the central concern of a Data Mesh is about enabling analytics, then how is Kafka relevant? In this talk we will describe how we managed to apply Data Mesh founding principles to our operational plane, based on Kafka. Consequently, we have gained value from these principles more broadly than just analytics. An example of this is treating data as-a-product, i.e. that data is discoverable, addressable, trustworthy and self-describing. We will then describe our implementation, which includes deep dives into Cluster Linking, Connectors and SMTs. Finally, we discuss the dramatic simplification of our analytical plane and consuming personas. Agenda

• Saxo Bank’s implementation of the Data Mesh

• Cluster Linking - Why? How?

• Data lake connectors – configuration and auto-deployment strategy

• Mapping Kafka to the data lake infrastructure.

• Enabling analytics, data warehousing and production support

Stream joins using Aerospike and RTP (Realtime Processing Platform)

Speaker: Vikas Sharma, Principal Engineer at CRED

About the talk: Stream joins are integral to real-time data processing, enabling the fusion of information from different data streams for more insightful analytics and product decisioning. Leveraging Aerospike, a high-performance NoSQL database, in conjunction with a Realtime Processing Platform (RTP), we will explore how Aerospike’s in-memory and disk hybrid storage architecture and RTP’s real-time computation abilities can work together to perform efficient stream joins. It will delve into various techniques, such as parallel stream processing, stream state management, sharding, async I/O and optimisation of stream join algorithms for eventual consistency.

A primer on streaming databases: next gen tooling for streaming and stream processing

Speaker: Pavan Keshavamurthy, CTO & Co-Founder @ Platformatory.io & Avinash Upadhyaya, Platform Engineer @ Platformatory.io

About the talk: In this talk, we will explore why streaming databases are the rage and what makes them the hottest thing in the world of streaming.

In particular, we will cover how they converge into and diverge from the world views of both traditional OLAP and stream processing.

As a final part of this talk, we will also deep dive into important characteristics of a few streaming DBs with a comparative lens.

Time permitting, a small demo of capabilities will also follow.

Speakers

Pavan Keshavamurthy
CTO & Co-founder @ Platformatory.io
Avinash Upadhyaya
Platform Engineer @ Platformatory.io
Vikas Sharma
Principal Engineer at CRED
Shivji Kumar Jha
Shivji has been working on databases and streaming systems for over a decade. Shivji currently heads the distributed data org at Nutanix. Previously, he has worked on organisations big and small, as a software engineer at MySQL Replication, 5th engineer at a startup working on a host of things including microservices, observability infrastructure, cloud and databases, designed and implemented the help chat bot and metadata catalogue for all data at Swiggy. Very active in the Open Source software community and a regular speaker at Open Source conferences, Shivji has contributed over 25 patches to MySQL and Apache Pulsar open source code bases.
Sagar Sumit
I am an Apache Committer and a Software Engineer at Onehouse, a fully-managed lakehouse platform. My primary contributions are to Apache Hudi’s core transactional engine and its integration with different query engines such as Presto and Trino.
Rahul Gulati
Principal Data Platforms Engineer, Working on building Data Mesh(Operational & Analytical Planes) using Confluent Kafka