UNCLASSIFIED - NO CUI

BROKER_ID not being set properly

Summary

When enabling multiple brokers in a replicaset via the helm chart (also bitnami), this container is not setting the KAFKA_CFG_BROKER_ID variable to something other than 0 for the nodes. This is affecting multi-broker/replica creation.

Steps to reproduce

Use the bitnami helm chart and set replicaCount to 3 in values when deploying along with using this image instead of the upstream.

What is the current bug behavior?

Each pod that comes up gets the BROKER_ID set to 0 which causes a conflict of writing the zookeeper values. Whichever node comes up first stays up, the rest crash.

When I switch back to the upstream container, this runs fine with no changes to the chart or values.

What is the expected correct behavior?

KAFKA_CFG_BROKER_ID is supposed to be set based on the replicaSet ID of the system + minBrokerID

Relevant logs and/or screenshots

[2021-07-20 17:48:28,282] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient)
[2021-07-20 17:48:28,302] ERROR Error while creating ephemeral at /brokers/ids/0, node already exists and owner '144308170190422021' does not match current session '72286449525522437' (kafka.zk.KafkaZkClient$CheckedEphemeral)
[2021-07-20 17:48:28,322] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:126)
	at kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1837)
	at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1775)
	at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1742)
	at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:95)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:312)
	at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
	at kafka.Kafka$.main(Kafka.scala:82)
	at kafka.Kafka.main(Kafka.scala)
[2021-07-20 17:48:28,324] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
[2021-07-20 17:48:28,326] INFO [SocketServer brokerId=0] Stopping socket server request processors (kafka.network.SocketServer)
[2021-07-20 17:48:28,330] INFO [SocketServer brokerId=0] Stopped socket server request processors (kafka.network.SocketServer)
[2021-07-20 17:48:28,336] INFO [ReplicaManager broker=0] Shutting down (kafka.server.ReplicaManager)
[2021-07-20 17:48:28,337] INFO [LogDirFailureHandler]: Shutting down (kafka.server.ReplicaManager$LogDirFailureHandler)
[2021-07-20 17:48:28,338] INFO [LogDirFailureHandler]: Shutdown completed (kafka.server.ReplicaManager$LogDirFailureHandler)

Possible fixes

TODO: Still investigating what the actual issue root cause is.

Defintion of Done

  • Bug has been identified and corrected within the container
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information