Sidecar termination prevents job retries
While the jobs in the monitoring chart are configured to have retries, in practice the retries will never work due to the placement of sidecar termination. The current flow looks like:
- webhook job attempt 1, fails
- sidecar termination happens (runs parallel to ^)
- webhook job attempt 2, fails due to no connectivity since sidecar is down
I would recommend pulling sidecar termination into the main container rather than doing it in a standalone container - while this would require some rework to the main container I think it would be the best option. Alternatively you could refactor sidecar termination to only happen when job retries are exhausted but that is probably more logic to implement than desired.
This issue applies to (at least) two jobs:
- https://repo1.dso.mil/big-bang/product/packages/monitoring/-/blob/main/chart/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml
- https://repo1.dso.mil/big-bang/product/packages/monitoring/-/blob/main/chart/templates/prometheus-operator/admission-webhooks/job-patch/job-patchWebhook.yaml