netstat command not found
While debugging an issue related to consul clustering we noticed logs complaining about netstat missing:
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] 2023-02-23T22:16:35.644Z [INFO] agent.server.serf.lan: serf: EventMemberLeave: 172.21.1.42 172.21.1.42
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] 2023-02-23T22:16:36.67 |INFO|CTL|utils.SetReady: - value=ctrl init done
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] 2023-02-23T22:16:36.67 |ERRO|CTL|main.clusterStart: Cluster failed - error=Failed to elect leader waited=5m0s
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] 2023-02-23T22:16:36|MON|Process ctrl exit status 255, pid=7
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] sh: netstat: command not found
I believe it is called here - https://github.com/neuvector/neuvector/blob/main/monitor/monitor.c#L580
In our case I believe is contributing to an unrecoverable state where if the controller pods fail to initially cluster correctly, the monitor
process crashes but seemingly does not kill the consul-agent
, which causes the consul agent to restart indefinitely complaining about the port being in use:
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] 2023-02-23T22:16:43.727Z [ERROR] agent: Error starting agent: error="Failed to start Consul server: Failed to start RPC layer: listen tcp 172.21.1.43:18300: bind: address already in use"
[pod/neuvector-controller-pod-5cc9d6fd7b-8t6xl/neuvector-controller-pod] 2023-02-23T22:16:43.728Z [INFO] agent: Exit code: code=1
Relates to: