[P1BIGROCKS-2040] Create automated maintenance for Big Bang dogfood cluster (#101) · Epics · Big Bang · GitLab

UNCLASSIFIED - NO CUI

[P1BIGROCKS-2040] Create automated maintenance for Big Bang dogfood cluster

[P1BIGROCKS-2040](https://jira.il2.dso.mil/browse/P1BIGROCKS-2040) I'm not sure what is and isn't done already, but here are some ideas: - For the nodes, introduce a cron job to `dnf upgrade` so we stay at the latest packages. Possibly just build this into the image. - Create a nightly job that compares our current RHEL 8 UBI image to IB. If the SHA changes, create a new image and update our cluster nodes and GitLab runners - Automatically identify runners that fail to start pipelines and restart/replace them. (see comment below) - Identify nodes that are running out of disk space and terminate the EC2 instance (auto scale will spin up a new one). What this attempts to fix: - Big Bang releases that worked on older images but have a problem with a new STIG, package version, etc. - Security and bug fixes on the cluster that may cause problems. - Badly behaving runners that give false failures in pipeline runs to developers - Running out of disk space on nodes

epic

UNCLASSIFIED - NO CUI