Bangalore Streams meetup - July 2024

Saturday 06 Jul 2024 9:30am - 2:30pm

RSVP

Venue

Address
ShareChat Mohalla Tech Private Limited , North Tower Smartworks ,Vaishnavi Tech Park, Survey No 16/1 & No 17/2 Ambalipura Village, Varthur Hobli, Bangalore East Taluk, Karnataka – 560103
Location
ShareChat, Bengaluru
Direction direciton image

Hello Bengaluru

We're excited to continue our new series of events in Bengaluru focused on data streaming and adjacent technologies. Our objective is to share knowledge and provide a platform for thought leadership around:

Event Streaming Technologies (Apache Kafka and more)
Event Driven Architecture
Stream Processing
Streaming Databases
Real-time analytics
Data Mesh
..and more.

We're hosting our next in-person event on July 6. Join us for exciting discussions in the streaming world with opportunities to network with peers and leaders in the industry.

Schedule

Name Speaker Start Time End Time Presentation Recording
Welcome and registration   09:30 AM 10:00 AM    
Realtime Triggers in Kestra Shruti Mantri, Moveworks 10:00 AM 10:30 AM Slides YouTube
Moving from Batch to Realtime : Inspired by a true production incident Shivji Jha, Nutanix 10:35 AM 11:10 AM    
Challenges & Learnings with using Kafka/Redpanda at a huge scale Anuraj Jain, Sharechat 11:15 AM 11:50 AM Slideshare YouTube
Networking break   11:50 AM 12:10 PM Slides YouTube
Unleashing Data Powerhouses - Benthos Shivam Yadav & Shubham Dhal, Sharechat 12:10 PM 12:45 PM Slides YouTube
Push Query Layer for Stream Processing Systems (Apache Flink) Avinash Upadhyaya & Pavan Keshavamurthy, Platformatory 12:45 PM 01:20 PM Slides YouTube
Lunch and Networking   01:30 PM 02:30 PM    

Talks

  • Realtime Triggers in Kestra

Speaker: Shruti Mantri, Software Engineer at Moveworks

About the talk:

  1. Introduction to Kestra
  2. Capabilities of Kestra
  3. Introduction to Realtime Triggers in Kestra
  4. How can realtime triggers be leveraged for event driven architecture

Kestra can serve as a central tool to handle event driven architecture, along with being an orchestration tool.

  • Moving from Batch to Realtime : Inspired by a true production incident

Speaker: Shivji Jha, Staff Engineer at Nutanix

About the talk: This is a real world account from an Apache Druid cluster in production. A story of 48 hours of debugging, learning and understanding batch vs stream better, filing a couple of issues in Druid open source projects and finally a stable production pipeline again thanks to the Druid community. We will discuss what parts of your design could be impacted, how you should change the related systems so the cascading failures don’t bring down your complete production availability. As an example, we will discuss the bottlenecks we had in overlord, slot issues for Peons in middle managers, coordinator bottlenecks, how to mitigated task and segment flooding, what configs we changed sprinkled with real world numbers and snapshots from our Grafana dashboards.

Finally we will list all the leanings and how we made sure we never repeat the same mistakes in production systems.

A real world account of a production incident showing

  1. How batch and realtime systems differ
  2. Kind of failures to anticipate
  3. How to be antifragile for future”
  • Challenges & Learnings with using Kafka/Redpanda at a huge scale

Speaker: Anuraj Jain, Software Engineer at Sharechat

About the talk: Kafka is a industry standard data steaming platform and a lot of data intensive companies use it already for their streaming data use-cases, In this talk I am going to cover-up on our (ShareChat’s) Challenges & Leaning’s from using/operating Kafka/Redpanda at a heavy scale (GBs per sec) and overcoming problems with systems, causes for issues etc. It will be a really good leanings/best-practices session for the devs to avoid mistakes in production at a scale. ShareChat already operates big Kafka/Redpanda clusters in production and at a heavy scale. I have done a lot of firefighting around various issues and onboarding/migrating of our services/jobs to Kafka protocol recently.

Devs watching this presentation will be able to understand what all issues can come-up in production with operating Kafka at a high scale with the Kafka system and their services/jobs, they will be able to understand why the issues come-up and what is the resolution, how to deal with a problem and how to avoid mistakes in production.

  • Unleashing Data Powerhouses - Benthos

Speakers: Shivam Yadav and Shubham Dhal, Software Engineers at Sharechat

About the talk: “In this session, we’ll unravel the story of ShareChat’s transition from Java to Benthos for crafting efficient ETL pipelines. With a single configuration file, Benthos effortlessly connects diverse sources and sinks, transforming the way we handle data. Discover how, armed with ““at least once”” guarantees, Benthos emerged as the go-to solution for our stateless pipelines.

Focused on real-world use cases, we’ll explore how Benthos, paired seamlessly with Kafka and Kafka Streams, became the linchpin of our operations. From sending notifications to triggering events for millions of users, to routing posts for reviews, Benthos proved its mettle in simplifying complex tasks.

Learn how we platformized Benthos at ShareChat, deploying over 20 jobs from a single repository. Delve into the specifics of how Benthos with Kafka and Kafka Streams powers our data pipelines, handling tasks such as dumping data to databases, making API calls, and routing events to different message queues. The session will highlight the simplicity and performance benefits we’ve achieved—processing 2K events/sec to an impressive 45K events/sec—without the need for extensive developer code.”

  • Push Query Layer for Stream Processing Systems

Speaker: Avinash Upadhyaya and Pavan Keshavamurthy, Platformatory

About the talk: One of the problems with stream processing systems is that they exist primarily in the streaming plane. These systems are only capable of processing data, which means it cannot hold an infinite amount of state, unlike databases. Some streaming processing systems work around this, usually by allowing some kind of interface to query the state store directly. But this usually a difficult problem to solve, mainly because the state stores are local and they involve the internals of the system, which aren’t necessarily meant to be exposed. Nonetheless, many stream processing systems do provide a work around to this problem - For example, in Kafka Streams, there is a way to perform interactive queries on the state store across different state stores.

Making this work with Apache Flink is an interesting problem to solve. In this talk, we will talk about how to make state stores queryable. Thus, providing a query layer on top of Flink and therefore embedding Flink into the operational plane.

Speakers

  • Shruti Mantri
    Software Engineer at Moveworks. Shruti is passionate about Data Engineering. She has contributed to multiple open source data engineering technologies, has Udemy courses related to Data Engineering, and has been a mentor to many who wants to excel in this domain.
  • Shivji Jha
    Staff Engineer at Nutanix. Shiv is a distributed systems enthusiast, open source lover with deep expertise in OLTP, OLAP and streaming systems and usage patterns. Very active in the community, Shiv is a contributor to multiple open source projects and a regular speaker with more than 25 talks at meetups and conferences
  • Anuraj Jain
    I am a software engineer at ShareChat, working for projects involving streaming data technologies - Kafka, Flink, Clickhouse etc. I have worked on evolving the ShareChat’s event steaming architecture with Kafka protocol, migrated the entire org onto Kafka protocol, done a lot of research & firefighting with issues/challenges with scale and use-cases at ShareChat with Kafka/Redpanda.
  • Shivam Yadav
    Shivam Yadav is a software engineer in the Livestream team at Sharechat. He has experience working with Kafka, Kafka Streams, and Redpanda, with a focus on message queues, stream processing, and ETL pipelines. Currently, he is working with Redpanda, Benthos and stream processing at scale.
  • Shubham Dhal
    “Shubham Dhal is a software engineer in Sharechat working in Livestream team. Earlier he was a part of Platform team, focussing on stream processing and realtime communication. His interests lie in realtime data and he has a hidden agenda to move everything he can get his hands on to streaming. Before Sharechat he was an equity derivatives quant at JP Morgan. Shubham has done his bachelor’s in engineering from Indian Institute of Technology, Kharagpur”
  • Pavan Keshavamurthy
    CTO & Co-founder @ Platformatory.io
  • Avinash Upadhyaya
    Platform Engineer at Platformatory.io