High Availability Investigation and Documentation
Investigate all pods and services within the package can run with more than one replica.
-
Test and document Alertmanager running with multiple replicas and sending alerts to a MM Webhook (to see if alerts are doubled or tripled for example) -
Test and document Prometheus running with multiple replicas and its necessary thanos configuration -
Test and document Grafana running with multiple replicas and any recommendations or caveats necessary -
Document any exceptions or extra configuration required in the High Availability section of the charter Architecture document for the monitoring package in BigBang.
Edited by Ghost User