UNCLASSIFIED - NO CUI

Skip to content

SKIP UPGRADE CHECK #37: Fix resource ordering to correct a hidden test failure that was giving false confidence

Andrew Kesterson requested to merge 37_resource_ordering into main

General MR

Summary

While working #37 (closed) we discovered that our helm tests for ESO are not returning valid results.

Specifically we were not able to create resources after helm installation that we were able to create as part of the helm release. Upon some investigation, we discovered that the problem was that the validating webhook was preventing resource creation because the helm release was unhealthy, despite appearing healthy. However the way the helm test resources were being created was bypassing this code path, because those resources were created before the validating webhook was created.

This fixes the resource ordering for the helm test resources, so that they are not created until the validating webhook has already been created. This reveals the true behavior of the external-secrets operator, which is that it is broken until the cert-controller (for whatever reason, on whatever timeline it decides) fixes the certificates on the webhook.

While work continues on the webhook issue, we need to fix the fact that our tests are giving us a false sense of confidence. As everything was written right now, our tests are invalid and do not report the truth. Accepting this MR fixes the resource ordering so that the tests report the truth.

Relevant logs/screenshots

Release "external-secrets" does not exist. Installing it now.
Error: failed post-install: warning: Hook post-install external-secrets/templates/tests/cluster-secret-store.yaml failed: 1 error occurred:
        * Internal error occurred: failed calling webhook "validate.secretstore.external-secrets.io": failed to call webhook: Post "https://external-secrets-webhook.external-secrets.svc:443/validate-external-secrets-io-v1beta1-secretstore?timeout=5s": proxy error from 127.0.0.1:6443 while dialing 10.42.0.5:10250, code 502: 502 Bad Gateway
$ kubectl get pods -n external-secrets
NAME                                                READY   STATUS    RESTARTS   AGE
external-secrets-69bdb4f646-hmpjm                   1/1     Running   0          3m36s
external-secrets-cert-controller-7fd4d6957b-rsrvg   1/1     Running   0          3m36s
external-secrets-webhook-8578569665-slcmj           1/1     Running   0          3m36s

It's worth pointing out that the cert-controller is unreliable in the time it takes to settle the certificates on the webhook pod. Sometimes it takes hours, sometimes it takes seconds. In some cases the cert-controller actually appears to come up healthy. So if the pipelines are passing clean install and that's surprising, then all I can say is, well .... ¯\_(ツ)_/¯

Linked Issue

#37 (closed)

Upgrade Notices

After this merge is accepted, clean install will no longer succeed, because the test resources will fail to install. The helm release itself is installed, but the helm install step will fail because resources in the post-install step will fail.

Edited by Andrew Kesterson

Merge request reports