UNCLASSIFIED - NO CUI

High Availability Investigation and Documentation

Investigate all pods and services within the package can run with more than one replica.

  • Test and document Alertmanager running with multiple replicas and sending alerts to a MM Webhook (to see if alerts are doubled or tripled for example)
  • Test and document Prometheus running with multiple replicas and its necessary thanos configuration
  • Test and document Grafana running with multiple replicas and any recommendations or caveats necessary
  • Document any exceptions or extra configuration required in the High Availability section of the charter Architecture document for the monitoring package in BigBang.
Edited by Ghost User