UNCLASSIFIED - NO CUI

Modify wait script to better handle `ArtifactFailed`

See: https://repo1.dso.mil/platform-one/big-bang/bigbang/-/jobs/6345892

Currently our wait script does a check for any failed HRs and will exit if it finds any, checking every 5 seconds (see the code here).

One of the problems with this is that flux sometimes puts HRs in a "failed" state as a result of flux timings/gitrepos not being fully ready. There was an attempt to avoid this issue by adding a wait for all gitrepos to be ready but it seems like sometimes Flux will still report the ArtifactFailed status even if the gitrepo is ready.

This task should involve:

  • Investigate what the ArtifactFailed status on a flux HR means - evaluate whether there are situations where we should consider this a real failure
  • If it is something we should retry/not consider a failure, evaluate options to work around this in the pipeline (retry/ignore, different gitrepo wait, etc)
  • Implement the chosen solution

AC:

  • Modified wait script that takes into account the ArtifactFailed status and handles it in a "smarter" way
Edited by evan.rush