[P1BIGROCKS-2040] Create automated maintenance for Big Bang dogfood cluster
[P1BIGROCKS-2040](https://jira.il2.dso.mil/browse/P1BIGROCKS-2040)
I'm not sure what is and isn't done already, but here are some ideas:
- For the nodes, introduce a cron job to `dnf upgrade` so we stay at the latest packages. Possibly just build this into the image.
- Create a nightly job that compares our current RHEL 8 UBI image to IB. If the SHA changes, create a new image and update our cluster nodes and GitLab runners
- Automatically identify runners that fail to start pipelines and restart/replace them. (see comment below)
- Identify nodes that are running out of disk space and terminate the EC2 instance (auto scale will spin up a new one).
What this attempts to fix:
- Big Bang releases that worked on older images but have a problem with a new STIG, package version, etc.
- Security and bug fixes on the cluster that may cause problems.
- Badly behaving runners that give false failures in pipeline runs to developers
- Running out of disk space on nodes
epic