Attempting to deploy a fresh bigbang 1.11.0, all the logging-ek-* pods fail to come up. I was able to find this error in the elasticsearch container logs:
BindTransportException[Failed to resolve publish address]; nested: UnknownHostException[logging-ek-es-data-0.logging-ek-es-data.logging.svc: Name or service not known];
Likely root cause: java.net.UnknownHostException: logging-ek-es-data-0.logging-ek-es-data.logging.svc: Name or service not known
Openshift version 4.6.4
Found that removing all network policies in the logging namespace allows all the pods to come up succesfully.
BigBang Version
1.11.0
Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related or that one is blocking others.
Learn more.
@dyoung we'll need some info if you will be able to exec into the pods and run some curl commands, along with what openshift version you're running.
If you can exec into one of the logging-ek-es-data-# pods and try and curl one of those endpoints it says "Name or service not known" for and verify in-cluster DNS resolution is truly blocked.
That issue above has been seen before but was fixed in versions of the elasticsearch-kibana package a few weeks ago.
We've identified that the issue is Openshift using 5353 for DNS instead of 53, updating the two DNS rules allow all the eks pods to come up succesfully
Ryan Garciachanged title from eks pods having trouble with network policies on BB 1.11.0 with openshift to EK DNS netpol does not allow port 5353 for Openshift
changed title from eks pods having trouble with network policies on BB 1.11.0 with openshift to EK DNS netpol does not allow port 5353 for Openshift