UNCLASSIFIED - NO CUI

monitoring helmrelease does not achieve health

Bug

Description

The monitoring namespace never gets to a healthy state after installing BigBang

BigBang Version

1.12.0 (possibly earlier as well)

kubectl get hr -n bigbang

NAME             READY   STATUS                                                                      AGE
bigbang          True    Release reconciliation succeeded                                            66m
eck-operator     True    Release reconciliation succeeded                                            57m
gatekeeper       True    Release reconciliation succeeded                                            57m
istio            True    Release reconciliation succeeded                                            57m
istio-operator   True    Release reconciliation succeeded                                            57m
jaeger           False   dependency 'bigbang/monitoring' is not ready                                46m
kiali            False   dependency 'bigbang/monitoring' is not ready                                46m
monitoring       False   Helm install failed: failed pre-install: job failed: BackoffLimitExceeded   21m

Error Logs


kubectl get events -n monitoring

LAST SEEN   TYPE      REASON             OBJECT                                                  MESSAGE
4m27s       Normal    Scheduled          pod/monitoring-monitoring-kube-admission-create-zbd5c   Successfully assigned monitoring/monitoring-monitoring-kube-admission-create-zbd5c to ip-10-0-81-89.us-gov-east-1.compute.internal
3m7s        Normal    Pulled             pod/monitoring-monitoring-kube-admission-create-zbd5c   Container image "registry1.dso.mil/ironbank/opensource/jet/kube-webhook-certgen:v1.5.1" already present on machine
3m7s        Normal    Created            pod/monitoring-monitoring-kube-admission-create-zbd5c   Created container create
3m6s        Warning   Failed             pod/monitoring-monitoring-kube-admission-create-zbd5c   Error: failed to create containerd task: OCI runtime create failed: container_linux.go:367: starting container process caused: chdir to cwd ("/home/nonroot") set in config.json failed: permission denied: unknown
2m41s       Warning   BackOff            pod/monitoring-monitoring-kube-admission-create-zbd5c   Back-off restarting failed container
4m28s       Normal    SuccessfulCreate   job/monitoring-monitoring-kube-admission-create         Created pod: monitoring-monitoring-kube-admission-create-zbd5c

Theories on Issue

Looking at the pods created in the monitoring namespace, they look like they have a securityContext defining a runAsUser as UID "2000". But the image in question ( registry1.dso.mil/ironbank/opensource/jet/kube-webhook-certgen:v1.5.1 ) looks to have the ownerships of the /home and /home/nonroot directories as owned by UID "65532" - and the permission mode is 0700 ... and I think this is causing the permission denied error shown above.

Maybe the image is broken; or maybe the UID should be changed to match the expected ownership.

Edited by Tim Martin