BROKER_ID not being set properly
Summary
When enabling multiple brokers in a replicaset via the helm chart (also bitnami), this container is not setting the KAFKA_CFG_BROKER_ID variable to something other than 0 for the nodes. This is affecting multi-broker/replica creation.
Steps to reproduce
Use the bitnami helm chart and set replicaCount
to 3 in values when deploying along with using this image instead of the upstream.
What is the current bug behavior?
Each pod that comes up gets the BROKER_ID set to 0 which causes a conflict of writing the zookeeper values. Whichever node comes up first stays up, the rest crash.
When I switch back to the upstream container, this runs fine with no changes to the chart or values.
What is the expected correct behavior?
KAFKA_CFG_BROKER_ID is supposed to be set based on the replicaSet ID of the system + minBrokerID
Relevant logs and/or screenshots
[2021-07-20 17:48:28,282] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient)
[2021-07-20 17:48:28,302] ERROR Error while creating ephemeral at /brokers/ids/0, node already exists and owner '144308170190422021' does not match current session '72286449525522437' (kafka.zk.KafkaZkClient$CheckedEphemeral)
[2021-07-20 17:48:28,322] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
at org.apache.zookeeper.KeeperException.create(KeeperException.java:126)
at kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1837)
at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1775)
at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1742)
at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:95)
at kafka.server.KafkaServer.startup(KafkaServer.scala:312)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
at kafka.Kafka$.main(Kafka.scala:82)
at kafka.Kafka.main(Kafka.scala)
[2021-07-20 17:48:28,324] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
[2021-07-20 17:48:28,326] INFO [SocketServer brokerId=0] Stopping socket server request processors (kafka.network.SocketServer)
[2021-07-20 17:48:28,330] INFO [SocketServer brokerId=0] Stopped socket server request processors (kafka.network.SocketServer)
[2021-07-20 17:48:28,336] INFO [ReplicaManager broker=0] Shutting down (kafka.server.ReplicaManager)
[2021-07-20 17:48:28,337] INFO [LogDirFailureHandler]: Shutting down (kafka.server.ReplicaManager$LogDirFailureHandler)
[2021-07-20 17:48:28,338] INFO [LogDirFailureHandler]: Shutdown completed (kafka.server.ReplicaManager$LogDirFailureHandler)
Possible fixes
TODO: Still investigating what the actual issue root cause is.
Defintion of Done
-
Bug has been identified and corrected within the container