monitoring helmrelease does not achieve health
Bug
Description
The monitoring namespace never gets to a healthy state after installing BigBang
BigBang Version
1.12.0 (possibly earlier as well)
kubectl get hr -n bigbang
NAME READY STATUS AGE
bigbang True Release reconciliation succeeded 66m
eck-operator True Release reconciliation succeeded 57m
gatekeeper True Release reconciliation succeeded 57m
istio True Release reconciliation succeeded 57m
istio-operator True Release reconciliation succeeded 57m
jaeger False dependency 'bigbang/monitoring' is not ready 46m
kiali False dependency 'bigbang/monitoring' is not ready 46m
monitoring False Helm install failed: failed pre-install: job failed: BackoffLimitExceeded 21m
Error Logs
kubectl get events -n monitoring
LAST SEEN TYPE REASON OBJECT MESSAGE
4m27s Normal Scheduled pod/monitoring-monitoring-kube-admission-create-zbd5c Successfully assigned monitoring/monitoring-monitoring-kube-admission-create-zbd5c to ip-10-0-81-89.us-gov-east-1.compute.internal
3m7s Normal Pulled pod/monitoring-monitoring-kube-admission-create-zbd5c Container image "registry1.dso.mil/ironbank/opensource/jet/kube-webhook-certgen:v1.5.1" already present on machine
3m7s Normal Created pod/monitoring-monitoring-kube-admission-create-zbd5c Created container create
3m6s Warning Failed pod/monitoring-monitoring-kube-admission-create-zbd5c Error: failed to create containerd task: OCI runtime create failed: container_linux.go:367: starting container process caused: chdir to cwd ("/home/nonroot") set in config.json failed: permission denied: unknown
2m41s Warning BackOff pod/monitoring-monitoring-kube-admission-create-zbd5c Back-off restarting failed container
4m28s Normal SuccessfulCreate job/monitoring-monitoring-kube-admission-create Created pod: monitoring-monitoring-kube-admission-create-zbd5c
Theories on Issue
Looking at the pods created in the monitoring namespace, they look like they have a securityContext defining a runAsUser as UID "2000". But the image in question ( registry1.dso.mil/ironbank/opensource/jet/kube-webhook-certgen:v1.5.1 ) looks to have the ownerships of the /home and /home/nonroot directories as owned by UID "65532" - and the permission mode is 0700 ... and I think this is causing the permission denied error shown above.
Maybe the image is broken; or maybe the UID should be changed to match the expected ownership.
Edited by Tim Martin