Unlocking The Potential of Local Kafka

Unlocking The Potential of Local Kafka

Introduction

Apache Kafka is a distributed streaming platform widely used for building real-time data pipelines and streaming applications. While cloud-based Kafka services are available, setting up Kafka locally offers a range of benefits, especially for development and testing. In this blog, we will explore why a local setup is beneficial, the various methods to achieve it, and provide a step-by-step guide for those methods and additionally we’ll be introducing few resources for setting up kafka locally, designed to simplify your testing and development processes, enabling you to deploy, manage, and monitor Kafka clusters efficiently on your local machine

Problem Statement

When working on small tasks and testing in cloud environments, developers often face significant challenges.

Initial Setup Time: Provisioning a Kafka cluster can take considerable time, especially for initial setups. For example, deploying a managed Kafka service (MSK) using AWS CDK can take around 20 minutes for the first run​ (GitHub). This setup time is comparable across other major cloud providers such as Azure and Google Cloud Platform (GCP).

Cluster Creation: Provisioning a Kafka cluster typically takes longer compared to producing/consuming messages. Creating a Kafka cluster on Confluent Cloud involves provisioning the necessary infrastructure and configuring settings. According to Confluent, this process takes approximately 10 minutes for a standard three-broker cluster.

Produce/Consume Messages: These involve the actual use of the resources, such as producing and consuming messages, and typically have very low latency, often under 10 milliseconds for individual operations.

Message Latency: The 99th percentile end-to-end latency (p99 E2E latency) for Confluent Cloud is generally low, with measurements such as 42 milliseconds at 10 MBps throughput and up to 73 milliseconds at 1.4 GBps throughput​ (Confluent)​.

Frequency of Occurrence: Although the significant time investment is primarily during the initial setup. Once the infrastructure is in place, making minor adjustments or running tests is faster, although any changes that require re-provisioning of resources will still take time.

The above timeframes of the Cluster creation and Producing/Consuming messages highlight the difference between setting up resources and using them, illustrating why developers might prefer local setups for rapid iterations and tests.

Why Set Up Kafka Locally?

  • Development and Testing: Local setups allow for rapid development cycles and immediate testing without the latency or cost implications of cloud services.

  • Platform Independence: Using Docker and Docker-Compose, a local Kafka setup can be made platform-independent, allowing you to develop and test on any machine without worrying about compatibility issues.

  • Greater Control: A local setup provides direct access to Kafka’s components, enabling finer-grained control over configuration, logging, and monitoring, which is essential for complex event-driven architectures.

  • Cost-Effective: Eliminates the need for cloud resources, making it cost-effective, especially for small teams or individual developers and when developers forget to delete the resources.

Methods for Setting Up Kafka Locally

  • Manual Installation: Downloading and configuring Kafka and Zookeeper manually on your machine. This method provides deep insights into Kafka’s workings but is time-consuming.

    After downloading and configuring Kafka and Zookeeper (deprecated since 3.5 and removed in 4.0) manually on your machine, you can create a cluster by starting multiple instances of Kafka and Zookeeper on different ports.

    Refer the official Apache Kafka Documentation for a step by step guide: Apache Kafka Quickstart

  • Using Package Managers: Using tools like Homebrew for macOS and similarly apt and choco for Linux and Windows respectively for installation can simplify the installation process. After installation to create a Kafka cluster, similarly to the manual installation you need to start multiple instances of Kafka brokers with different configurations.

  • Docker: Using Docker to containerize Kafka and Zookeeper, making the setup process quicker and consistent. Docker provides a portable and lightweight environment that ensures your Kafka setup works the same way across different systems.

  • Docker Compose: Docker Compose further simplifies this process by allowing you to define and manage multi-container Docker applications. With a single YAML file, you can specify the services, networks, and volumes required for Kafka and Zookeeper, making it easy to deploy and manage your Kafka environment consistently across different development and testing setups.

Preferred Method for Local Installation

Using Docker Compose to set up Kafka locally is the best approach out of these due to its consistency, ease of use and scalability i.e the ability to easily adjust the number of Kafka and Zookeeper instances, allocate appropriate resources, manage deployments efficiently, and ensure consistent performance and isolation across different environments.

Manual Installation and Installation via Package Managers are relatively the same, as using a Package manager just eliminates the step of downloading the installation file manually, both the methods lack the portability and consistency that Docker offers, making them less ideal for replicating environments across different machines, and Setting up and configuring each instance manually or via package managers is time-consuming and prone to errors.

Hence Docker Compose’s ability to provide a consistent setup across different machines, along with powerful tools for monitoring and troubleshooting, makes it the ideal solution for running Kafka locally.

So how do you make use of a Local kafka Setup?

At Platformatory, we leverage Kafka Sandbox which is setup locally and use it for testing and troubleshooting processes. The Kafka sandbox is an isolated environment mimicking a cloud setup using a docker-compose setup, along with Prometheus and Grafana deployed for observability and debugging there by achieving all these without the overhead of cloud infrastructure.

What are the benefits of using the Sandbox?

Realistic Testing and Performance Tuning: The benefit of having a setup similar to Production Environment is you can accurately simulate how Kafka and associated services will behave under real-world conditions to a certain degree and Potential issues and bugs can be identified and resolved before they impact the production environment.

We can conduct performance and load testing to understand how the system scales and performs under various conditions, and fine-tuning the configurations and resource allocations in the sandbox according to the performance can help optimize the production setup.

Monitoring and Metrics: By integrating Prometheus and Grafana, we continuously monitor Kafka’s performance, health and allows us to observe the cluster performance when we experiment with the configuration changes and also helps us to proactively identify and address any potential bottlenecks or issues.

Configuration Management: The sandbox uses properties files mounted as volumes, which helps replicate certain aspects of a VM or bare-metal server environment. This includes persistent configuration management and network settings that provide isolation and resource allocation similar to traditional server setups. This allows us to manage configurations easily and consistently across different environments. However, it is essential to note that containers do not completely replicate the behavior of VMs or bare-metal servers, particularly regarding hardware interaction and persistent storage.

Using the Confluent Platform (CP) Sandbox

The CP sandbox is built using Docker Compose, which allows us to define and run multi-container Docker applications. Docker-compose file for Confluent Kafka with configuration mounted as properties files, brings up Kafka and components with JMX metrics exposed and visualized using Prometheus and Grafana. The environment simulates running Confluent Platform on VMs/Bare metal servers using properties files but using docker containers.

Access the CP-Sandbox repository from here.

The cluster contains the following resources deployed:

  • 1 Zookeeper
  • 3 Kafka Brokers
  • LDAP Server
  • Control Center
  • Kafka Client Container
  • Prometheus and Grafana

Check the kafka server.properties for more details about the configurations of the setup.

Running the CP Sandbox

Clone the Repository: Start by cloning the repository containing the Docker Compose file and configuration files, you can access the repository via this link.

1
git clone git@github.com:Platformatory/cp-sandbox.git
1
cd cp-sandbox

Start the Services: Use Docker Compose to start all the services defined in the docker-compose.yml file.

1
docker-compose up -d

This command will bring up a three-node Kafka cluster with security enabled, along with other components like Zookeeper, Schema Registry, Kafka Connect, Control Center, Prometheus, Grafana, and OpenLDAP. There is a main branch and 12 Branches that are each troubleshooting scenarios for you to switch between and attempt.

Check Service Status: Verify that all services are up and running.

1
docker-compose ps -a

Ensure there are no exited services and all containers have the status Up.

Conclusion

Setting up Kafka locally unlocks immense potential for developers, offering a swift, cost-effective, and controlled environment for development, testing, and troubleshooting. Whether you choose manual installation, package managers, or Docker Compose, a local Kafka setup can significantly enhance your development workflow. Among these methods, Docker Compose stands out for its consistency, scalability, and ease of use, making it the ideal solution for running Kafka on your local machine.

By using the CP Sandbox, you can efficiently set up, manage, and gain insights into troubleshooting Kafka clusters. The troubleshooting scenarios in the repository allow you to practice and resolve potential production issues in a controlled environment.

Sounds interesting? If you wish to know more, take a look at our next blog titled “Learning by Debugging: Kafka Local Setup” where we dive deeper into troubleshooting scenarios and make use of the monitoring tools like Grafana which is deployed along with the Sandbox and how to set them up. In the next blog we also take a look at CP Ansible Sandbox, a setup which can be used for automated deployment and management of Confluent Platform clusters.

About The Authors

About The Authors

Subramanya Madhyastha
Subramanya Madhyastha Devops Engineer at Platformatory specializing in Site Reliability Engineeing and Infrastructure.