Create more robust solution to deploy Constraints after ConstraintTemplates
Problem:
Gatekeeper handles creating/deleting/updating CRDs from the ConstraintTemplate
resource. Constraints
may use the ConstraintTemplate
are are validated at install using OpenAPI Schema. If a new field is added to the ConstraintTemplate
, it may not be installed by Gatekeeper before Helm installs the Constraint
that uses the new field. This results in an error and Flux will rollback the upgrade. The current implementation is to set all the Constraints
as post-upgrade hooks so the ConstraintTemplates
are installed first. But, there is a race condition in Gatekeeper in which you cannot guarantee the Constraints are deployed after CRDs are created.
Solution:
Moving the Constraints
to a new Helm chart does not fix the problem since Helm installs CRs in an indeterministic order. So, the resources for the Constraints
Flux HelmRelease may be updated before the Gatekeeper
Flux HelmRelease, resulting in the same error.
One possible way to fix this is to move the Constraints to ConfigMaps. Then, on upgrades, add a job that will check the Constraint
against the CRD schema prior to deploying it. The job would have to wait an appropriate amount of time before failing the Constraint. The job would also need to block the helm upgrade --wait
until it completed so the user was aware of the failure.
Alternative solutions to the job approach can be used if a simpler approach is found.