During our latest mattermost renovate upgrading minio dependency started throwing errors due to minio's wait job not being in the correct namespace. Disabling the wait job causes kyverno policies to throw errors as all the templates gluon creates are empty.
Solution:
Surround the test-wait-job.yaml with an if statement to check if wait-job is enabled
{{- if .Values.minio.waitJob.enabled }}{{ end }}
Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related or that one is blocking others.
Learn more.
The issue here is in the way the gluon wait job template is toggled on/off and executed, and how kiali processes that output. In order to use the gluon wait job, you have to implement a template that conditionally creates some resources. If you disable the wait job, that results in helm template producing a set of empty YAML files. Helm doesn't care about this, neither does flux, they will both deploy the product just fine. However, kiali processes the helm template output as a stream of yaml documents and doesn't know how to handle an empty yaml document. It expects there to be content.
There are two other options I can think of to try to resolve this:
Place a shim in the pipeline between the helm template command, and the kiali command, that looks for and removes empty YAML documents from the stream.
Add some kind of dummy resource to the YAML files which contain the conditionally-generated YAML, so that even if the wait job is disabled, the dummy resource is still there so kiali would not complain. Something like an empty configmap would be sufficient, any valid kubernetes resource would work.
The second option is the simplest but also looks stupid and results in a wasted resource. The first option requires making changes to the pipeline-templates and affects every project, not just ours.
For simplicity's sake, I would try option 2 first. Just add some dummy resource at the top of the file that will always be generated, regardless of whether or not the wait job is enabled, and see if that stops the issue.
It appears the waitJob was also disable on Loki about a month ago which points to a bigger underlying issue. I'm not sure if its due to the way gluon is implementing it or not, but disabling bbtests does not give the same issue; Those required resources are only created when bbtests.enabled is set to true and leaving it off does not generate empty yaml.
Another issue here I think comes down to what exactly the minio wait script is doing. The reason I disabled it in Mattermost was because its hanging in my local environment and causing the pipeline to fail as that hangs as well. Looking at the wait script for minio it appears to be hard coded to look for a tenant in the minio namespace which maybe doesn't make sense for packages that use this as a resource. The minio tenant for mattermost is created in the mattermost namespace and the waitJob hangs because it doesn't have permissions outside of the mattermost namespace.
I'm not sure if that's the same reason it was disabled in Loki, but it appears to have been disabled there due to hanging as well:
This may actually be two issues, one to fix gluon, and the other to make the wait script check for the tenant in the namespace in which its being used.
I stand corrected, setting bbtests.enabled to False results in the same issue.
It looks like the only reason its never been seen before is because that section is only present in test-values.yaml as its intended to be used as part of the CI/CD process. I would think moving the waitJob section from values.yaml into test-values.yaml on the minio package would resolve the issue. Since that's also technically part of the CI/CD process it probably belongs there in the first place.
If another package that's using minio needs to run the wait script for minio that part can simply be added to that package's test-values.yaml under the minio section.