Architecture & Core Concepts
This setup uses Kafka 4.1 in KRaft mode — which means no ZooKeeper at all. Each broker also acts as a controller, forming a self-managed quorum of 3 nodes. This is the modern, production-ready way to run Kafka.
⚡ KRaft Mode
Kafka Raft — replaces ZooKeeper entirely. Each broker here also acts as a controller, forming a 3-node quorum. Simpler operations, fewer moving parts.
🖥️ 3 Brokers
Dedicated Ubuntu VMs — one Kafka process per VM. Replication factor 3 means every topic partition has a copy on each broker. No single point of failure.
🔌 Kafka Connect
A distributed integration layer for moving data between Kafka and external systems (PostgreSQL, S3, Elasticsearch, etc.) using pluggable connectors.
🖼️ Kafka UI
Provectus Kafka UI — a web dashboard for browsing topics, consumer groups, connector status, and producing/consuming test messages.
Listener Architecture
Each broker exposes three listeners on different ports with different roles:
| Listener | Port | Role | Used By |
|---|---|---|---|
| INTERNAL | 19092 | Inter-broker replication | Other brokers, Kafka Connect, Kafka UI (inside network) |
| EXTERNAL | 9092 | External client access | Producers / Consumers outside the broker network |
| CONTROLLER | 9093 | KRaft quorum votes | Controller-to-controller communication only |
Prerequisites — VM Sizing
- Network reachability — all broker VMs must reach each other on ports 9092, 9093, and 19092
- Docker VM must reach all broker VMs on port 19092
- SSH access to all VMs with sudo privileges
- Static IPs or DNS names for all broker VMs — these go into
controller.quorum.votersandadvertised.listeners
Port Reference
Install Java & Kafka on All 3 Broker VMs
SSH into each broker VM and run the following. Kafka 4.1 requires Java 21.
# Update package list and install Java 21
sudo apt update
sudo apt install -y openjdk-21-jdk wget
# Verify Java version
java -version # should show: openjdk version "21..."
# Download Kafka 4.1.1 into /opt
cd /opt
sudo wget https://downloads.apache.org/kafka/4.1.1/kafka_2.13-4.1.1.tgz
# Extract and create a stable symlink
sudo tar -xzf kafka_2.13-4.1.1.tgz -C /opt
sudo ln -s /opt/kafka_2.13-4.1.1 /opt/kafka
# Verify the directory
ls /opt/kafka/bin/ # should list: kafka-server-start.sh, kafka-topics.sh, etc.
Edit server.properties — Common Settings
Open /opt/kafka/config/server.properties on each VM and update the following settings. The values here are identical on all three brokers — only node.id and advertised.listeners differ (covered in the next step).
vim /opt/kafka/config/server.properties
Find and update each of these settings in the file. If a line is missing, add it. If controller.quorum.bootstrap.servers exists, comment it out.
# ─── KRaft mode: this node is both broker and controller ──────
process.roles=broker,controller
controller.listener.names=CONTROLLER
# ─── Listener definitions ─────────────────────────────────────
listeners=INTERNAL://:19092,EXTERNAL://:9092,CONTROLLER://:9093
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL,INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
# Use INTERNAL listener for broker-to-broker replication
inter.broker.listener.name=INTERNAL
# ─── KRaft quorum: list all 3 broker controller endpoints ─────
# Replace with your actual broker IPs
controller.quorum.voters=1@<Broker1_IP>:9093,2@<Broker2_IP>:9093,3@<Broker3_IP>:9093
# Comment out this line if it exists (bootstrap.servers not used in static KRaft)
#controller.quorum.bootstrap.servers=localhost:9093
# ─── Storage ──────────────────────────────────────────────────
log.dirs=/var/lib/kafka/logs
# ─── Replication & partitioning ───────────────────────────────
num.partitions=3
default.replication.factor=3
min.insync.replicas=2
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
share.coordinator.state.topic.replication.factor=3
share.coordinator.state.topic.min.isr=2
replication.factor=3 and min.insync.replicas=2, Kafka requires at least 2 brokers to acknowledge a write. This means you can lose 1 broker and still accept writes. Never set min.insync.replicas equal to or greater than replication.factor — all brokers would need to be up for any write to succeed.
Per-Server Config — node.id & advertised.listeners
These two settings are unique per broker. Set them after the common config above. Replace the IP placeholders with each broker's actual IP address.
node.id=1
advertised.listeners=INTERNAL://<Broker1_IP>:19092,EXTERNAL://<Broker1_IP>:9092
node.id=2
advertised.listeners=INTERNAL://<Broker2_IP>:19092,EXTERNAL://<Broker2_IP>:9092
node.id=3
advertised.listeners=INTERNAL://<Broker3_IP>:19092,EXTERNAL://<Broker3_IP>:9092
controller.quorum.voters=1@IP:9093,2@IP:9093,3@IP:9093 must exactly match the node.id on each respective broker. Mismatch causes the quorum to fail to form.
Create Directories & Set Permissions
Run on all 3 broker VMs. Replace $USER with your actual Linux username if the variable doesn't expand correctly.
# Create Kafka log data directory (log.dirs in server.properties)
sudo mkdir -p /var/lib/kafka/logs
sudo chown -R $USER:$USER /var/lib/kafka/logs
# Create Kafka application log directory (for stdout / stderr)
sudo mkdir -p /opt/kafka/logs
sudo chown -R $USER:$USER /opt/kafka/logs
# Give your user ownership of the Kafka installation
sudo chown -R $USER:$USER /opt/kafka
sudo chown -R $USER:$USER /opt/kafka_2.13-4.1.1
KRaft Storage Format — UUID Generation
This is the most critical step. All three brokers must be formatted with the exact same UUID. The UUID ties the cluster together — brokers with different UUIDs will refuse to join the same quorum.
Generate UUID on Broker 1 Only
Run this only on Broker 1. Copy the output UUID — you will use it on all three brokers.
cd /opt/kafka/bin
./kafka-storage.sh random-uuid
# Example output — yours will be different:
AbCdEfGhIjKlMnOpQrStUv
# Copy this UUID — you need it in the next step on ALL brokers
Format Storage on All 3 Brokers with the Same UUID
Run this command on Broker 1, Broker 2, and Broker 3 — using the same UUID generated above.
cd /opt/kafka/bin
# Replace <UUID> with the value from step 01
./kafka-storage.sh format \
-t <UUID> \
-c /opt/kafka/config/server.properties
Verify the Format Created the Metadata Files
ls /var/lib/kafka/logs/
# Expected output — you should see:
bootstrap.checkpoint meta.properties __cluster_metadata-0/
kafka-storage.sh format again on a broker that already has data will wipe all topic data on that broker. Only run it once, during initial setup.
Firewall — Open Required Ports
Brokers must be able to reach each other. Run on all 3 broker VMs. Also open ports from the Docker VM IP so Kafka Connect and Kafka UI can connect.
# Check current firewall status
sudo ufw status
# Allow XMPP client connections (external producers/consumers)
sudo ufw allow 9092/tcp
# Allow inter-broker + Connect/UI listener
sudo ufw allow 19092/tcp
# Allow KRaft controller quorum
sudo ufw allow 9093/tcp
# Reload and confirm
sudo ufw reload
sudo ufw status numbered
# Optional: restrict 19092 to Docker VM IP only (more secure)
sudo ufw allow from <DockerVM_IP> to any port 19092
nc -zv <Broker2_IP> 9093. If it hangs, the firewall is blocking the controller port. All 3 brokers must reach all others on 9092, 9093, and 19092.
Set Up as a Systemd Service
Running Kafka as a systemd service means it starts on boot and auto-restarts on failure. Create the same service file on all 3 brokers. Update the User= field to match your Linux username.
sudo vim /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka (KRaft)
After=network.target
[Service]
Type=simple
User=your_linux_user # ← replace with your actual username
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
# Application logs go here (separate from topic data)
StandardOutput=append:/opt/kafka/logs/kafka.out
StandardError=append:/opt/kafka/logs/kafka.err
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start kafka
sudo systemctl enable kafka # auto-start on reboot
sudo systemctl status kafka # should show: active (running)
Verify the Broker Cluster
# Tail the Kafka log — look for "KRaft state is LEADER" or "Quorum leader"
tail -f /opt/kafka/logs/kafka.out
# List all brokers in the cluster metadata
/opt/kafka/bin/kafka-metadata-quorum.sh \
--bootstrap-server localhost:9092 \
describe --status
# Create a test topic across all 3 brokers
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server localhost:9092 \
--create --topic test-cluster \
--partitions 3 \
--replication-factor 3
# Describe the topic — confirm replicas are spread across all 3 nodes
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server localhost:9092 \
--describe --topic test-cluster
kafka-metadata-quorum.sh describe --status should show LeaderId: 1 (or 2 or 3) and all 3 voters as active. The topic describe should show each partition with Replicas: 1,2,3 and Isr: 1,2,3.
Install Docker & Docker Compose on the Docker VM
SSH into the Docker VM (the 4th machine) and run the following to install Docker Engine and the Compose plugin.
# Remove any old Docker packages
sudo apt remove -y docker docker-engine docker.io containerd runc
# Install dependencies
sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add Docker's repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine + Compose plugin
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Add your user to the docker group (avoids needing sudo)
sudo usermod -aG docker $USER
newgrp docker
# Verify
docker --version
docker compose version
Create Directories & Download Dependencies
All commands run from the Docker VM. Set up the working directory, the JMX exporter, and the Kafka Connect JDBC plugin + PostgreSQL driver.
# Create working directory
mkdir ~/kafka && cd ~/kafka
# ── JMX Exporter ───────────────────────────────────────────
mkdir jmx-exporter && cd jmx-exporter
wget https://github.com/prometheus/jmx_exporter/releases/download/1.5.0/jmx_prometheus_javaagent-1.5.0.jar
# Create the JMX scrape rules config (see next step)
vim kafka-connect.yml
# ── Kafka Connect JDBC Plugin ───────────────────────────────
cd ~/kafka
mkdir connect-plugins && cd connect-plugins
# JDBC connector (for DB source/sink connectors)
wget https://packages.confluent.io/maven/io/confluent/kafka-connect-jdbc/10.7.4/kafka-connect-jdbc-10.7.4.jar
# PostgreSQL JDBC driver
wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.3/postgresql-42.7.3.jar
# Confirm both jars are present
ls ~/kafka/connect-plugins/
~/kafka/connect-plugins/ before starting Docker Compose. The directory is mounted to /etc/kafka-connect/jars inside the container. Restart the connect container after adding new JARs: docker compose restart kafka-connect.
JMX Exporter Config — kafka-connect.yml
This file tells the JMX exporter which Kafka Connect metrics to expose to Prometheus. Place it in ~/kafka/jmx-exporter/kafka-connect.yml.
lowercaseOutputName: true
rules:
# App info — start time and version
- pattern: 'kafka.(.+)<type=app-info, client-id=(.+)><>start-time-ms'
name: kafka_$1_start_time_seconds
labels:
clientId: "$2"
help: "Kafka $1 JMX metric start time seconds"
type: GAUGE
valueFactor: 0.001
- pattern: 'kafka.(.+)<type=app-info, client-id=(.+)><>(commit-id|version): (.+)'
name: kafka_$1_$3_info
value: 1
labels:
clientId: "$2"
$3: "$4"
help: "Kafka $1 JMX metric info version and commit-id"
type: GAUGE
# Producer/consumer topic + partition metrics
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(.+-total|compression-rate|.+-avg|.+-replica|.+-lag|.+-lead)
name: kafka_$2_$6
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
type: GAUGE
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+)><>(.+-total|compression-rate|.+-avg)
name: kafka_$2_$5
labels:
clientId: "$3"
topic: "$4"
type: GAUGE
# Node-level metrics
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), node-id=(.+)><>(.+-total|.+-avg)
name: kafka_$2_$5
labels:
clientId: "$3"
nodeId: "$4"
type: UNTYPED
# General client metrics
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.*)><>(.+-total|.+-avg|.+-bytes|.+-count|.+-ratio|.+-age|.+-flight|.+-threads|.+-connectors|.+-tasks|.+-ago)
name: kafka_$2_$4
labels:
clientId: "$3"
type: GAUGE
# Connector task status
- pattern: 'kafka.connect<type=connector-task-metrics, connector=(.+), task=(.+)><>status: ([a-z-]+)'
name: kafka_connect_connector_status
value: 1
labels:
connector: "$1"
task: "$2"
status: "$3"
type: GAUGE
# Connector task metrics (errors, requests, retries)
- pattern: kafka.connect<type=(.+)-metrics, connector=(.+), task=(.+)><>(.+-total|.+-count|.+-ms|.+-ratio|.+-avg|.+-failures|.+-requests|.+-timestamp|.+-logged|.+-errors|.+-retries|.+-skipped)
name: kafka_connect_$1_$4
labels:
connector: "$2"
task: "$3"
type: GAUGE
# Worker connector metrics
- pattern: kafka.connect<type=connect-worker-metrics, connector=(.+)><>([a-z-]+)
name: kafka_connect_worker_$2
labels:
connector: "$1"
type: GAUGE
# Worker global metrics
- pattern: kafka.connect<type=connect-worker-metrics><>([a-z-]+)
name: kafka_connect_worker_$1
type: GAUGE
# Worker rebalance metrics
- pattern: kafka.connect<type=connect-worker-rebalance-metrics><>([a-z-]+)
name: kafka_connect_worker_rebalance_$1
type: GAUGE
docker-compose.yaml
Create this file at ~/kafka/docker-compose.yaml. Replace all <BrokerN_IP> placeholders with actual IPs. Three containers: Kafka UI, Kafka Connect, and Kafka Exporter.
version: "3.8"
services:
# ── Kafka UI — Web management dashboard ────────────────────
kafka-ui:
image: provectuslabs/kafka-ui:latest
container_name: kafka-ui
ports:
- "8190:8080"
environment:
KAFKA_CLUSTERS_0_NAME: kafka-cluster
# Use INTERNAL listener port 19092 for UI ↔ broker traffic
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: <Broker1_IP>:19092,<Broker2_IP>:19092,<Broker3_IP>:19092
# Login form auth — change password before production use
AUTH_TYPE: LOGIN_FORM
SPRING_SECURITY_USER_NAME: admin
SPRING_SECURITY_USER_PASSWORD: password@123
KAFKA_CLUSTERS_0_PROPERTIES_ALLOW_DELETE_TOPICS: "true"
KAFKA_CLUSTERS_0_PROPERTIES_ALLOW_CREATE_TOPICS: "true"
KAFKA_CLUSTERS_0_PROPERTIES_ALLOW_EDIT_CONFIGS: "true"
JAVA_OPTS: "-Xmx512m"
restart: unless-stopped
# ── Kafka Connect — Connector framework ────────────────────
kafka-connect:
image: confluentinc/cp-kafka-connect:7.7.8
container_name: kafka-connect
ports:
- "8083:8083" # REST API
- "7072:7072" # JMX metrics for Prometheus
restart: unless-stopped
volumes:
- ./connect-plugins:/etc/kafka-connect/jars # JDBC + PG driver
- ./jmx-exporter:/opt/jmx # JMX agent + config
environment:
CONNECT_BOOTSTRAP_SERVERS: <Broker1_IP>:19092,<Broker2_IP>:19092,<Broker3_IP>:19092
CONNECT_GROUP_ID: kafka-connect-group
# Internal Kafka topics that Connect uses to store its state
CONNECT_CONFIG_STORAGE_TOPIC: connect-configs
CONNECT_OFFSET_STORAGE_TOPIC: connect-offsets
CONNECT_STATUS_STORAGE_TOPIC: connect-status
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: "3"
CONNECT_REST_ADVERTISED_HOST_NAME: kafka-connect
CONNECT_REST_PORT: 8083
# JSON converters — no schema registry needed
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE: "false"
CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: "false"
# Plugin path — includes mounted jars and built-in plugins
CONNECT_PLUGIN_PATH: /usr/share/java,/etc/kafka-connect/jars
# Reduce noisy logs from network and reflection libraries
CONNECT_LOG4J_LOGGERS: org.apache.kafka.clients.NetworkClient=ERROR,org.reflections=ERROR
# JMX Prometheus agent — exposes Connect JVM metrics on port 7072
# Remove KAFKA_OPTS line below if you don't want monitoring
KAFKA_OPTS: "-javaagent:/opt/jmx/jmx_prometheus_javaagent-1.5.0.jar=7072:/opt/jmx/kafka-connect.yml"
# Heap: 4GB min/max — adjust based on number of active connectors
KAFKA_HEAP_OPTS: "-Xms4g -Xmx4g"
# ── Kafka Exporter — Broker-level Prometheus metrics ───────
# Optional: remove this service if you don't need broker monitoring
kafka-exporter:
image: danielqsj/kafka-exporter
container_name: kafka-exporter
ports:
- "9308:9308"
command:
- "--kafka.server=<Broker1_IP>:19092"
- "--kafka.server=<Broker2_IP>:19092"
- "--kafka.server=<Broker3_IP>:19092"
restart: unless-stopped
Broker1_IP:19092, Broker2_IP:19092, Broker3_IP:19092. Using the same IP twice means one broker gets no traffic from Connect / UI.
Start & Verify Docker Containers
# Start all containers in detached mode
cd ~/kafka
docker compose up -d
# Watch startup logs
docker compose logs -f
# Check all containers are running
docker compose ps
# Test Kafka Connect REST API is responding
curl http://localhost:8083/
# List installed connector plugins
curl http://localhost:8083/connector-plugins | python3 -m json.tool
# Test Kafka UI is accessible
curl -I http://localhost:8190/
curl http://localhost:8083 fails immediately, wait 60 seconds and retry.
Access the Kafka UI
Open a browser and go to http://<DockerVM_IP>:8190. Log in with the credentials set in docker-compose.yaml: username admin, password password@123. You should see all 3 brokers listed and the cluster health overview.
Prometheus Scrape Endpoints
Two exporters are already running from the Docker VM. Configure Prometheus to scrape both.
| Exporter | Endpoint | Metrics |
|---|---|---|
| kafka-exporter | http://<DockerVM_IP>:9308/metrics | Broker-level: topic lag, partition count, consumer group offsets, leader election rate |
| jmx-exporter | http://<DockerVM_IP>:7072/metrics | Kafka Connect JVM: connector status, task errors, throughput, rebalances |
scrape_configs:
# Broker-level metrics (topics, consumer lag, partition leadership)
- job_name: 'kafka-brokers'
static_configs:
- targets: ['<DockerVM_IP>:9308']
scrape_interval: 15s
# Kafka Connect JVM metrics (connector tasks, errors, throughput)
- job_name: 'kafka-connect'
static_configs:
- targets: ['<DockerVM_IP>:7072']
scrape_interval: 15s
Grafana Dashboards
Import these community dashboards directly in Grafana (Dashboards → Import → Grafana.com ID):
| Dashboard | Grafana ID | Data Source |
|---|---|---|
| Kafka Exporter Overview | 7589 | Prometheus → kafka-brokers job |
| Kafka Consumer Lag | 12460 | Prometheus → kafka-brokers job |
| Kafka Connect Dashboard | 11962 | Prometheus → kafka-connect job |
What You Have Now
📨 Kafka Cluster
🐳 Docker Services
Key Takeaways
- Same UUID on all 3 brokers — generated once on Broker 1, applied to all via
kafka-storage.sh format - Comment out bootstrap.servers — the
controller.quorum.bootstrap.serversline conflicts with the static voters config - Start brokers close together — quorum needs 2 of 3 to be reachable; large time gaps cause leader election delays
- INTERNAL listener (19092) for Connect + UI — don't use the EXTERNAL port for internal services
- Kafka Connect takes ~60s to initialise — it must create its internal topics on first start
- Add more connectors by dropping JARs into
connect-plugins/and restarting the container