Default Prometheus Configuration allows WAL to grow indefinitely
Summary
Using the default storage settings for .Values.prometheus.prometheusSpec:, IronBank noticed that the Prometheus WAL log had grown to fill the entire 30Gi VolumeGroup used by the node host for ephemeral pod storage (/var/lib/kubelet/pods/).
Symptoms
The following logs were collected from the prometheus-monitoring-monitoring-kube-prometheus-0 pod in the monitoring namespace:
level=warn ts=2021-10-20T15:02:19.491Z caller=manager.go:619 component="rule manager" group=kubernetes-resources msg="Rule sample appending failed" err="write to WAL: log samples: write /prometheus/wal/00000714: no space left on device"
level=warn ts=2021-10-20T15:02:19.542Z caller=manager.go:619 component="rule manager" group=istio.metricsAggregation-rules msg="Rule sample appending failed" err="write to WAL: log samples: write /prometheus/wal/00000714: no space left on device"
level=warn ts=2021-10-20T15:02:19.549Z caller=manager.go:619 component="rule manager" group=istio.metricsAggregation-rules msg="Rule sample appending failed" err="write to WAL: log samples: write /prometheus/wal/00000714: no space left on device"
level=warn ts=2021-10-20T15:02:19.556Z caller=manager.go:619 component="rule manager" group=istio.metricsAggregation-rules msg="Rule sample appending failed" err="write to WAL: log samples: write /prometheus/wal/00000714: no space left on device"
level=error ts=2021-10-20T15:02:19.619Z caller=scrape.go:1088 component="scrape manager" scrape_pool=monitoring/monitoring-monitoring-kube-istio-envoy/0 target=http://192.168.54.31:15020/stats/prometheus msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000714: no space left on device"
Execing to the prometheus pod to confirm:
bash-4.4$ df -Th /prometheus/
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-varVol ext4 30G 29G 0 100% /prometheus
Where /prometheus/ is mapped to /var/lib/kubelet/pods on the host machine.
These failures resulted in a non-responsive prometheus pod, which causes HPAs to fail to gather metrics from the v1beta1.metrics.k8s.io APIService.
Temporary Solution
In the short term, restarting the pod flushes all data in the WAL, allowing HPAs to gather metrics and scale replica sets again.
Medium term, we added the following settings to keep prometheus from being in a persistently bad state:
monitoring:
values:
prometheus:
prometheusSpec:
nodeSelector:
ironbank: prometheus
tolerations:
- key: ironbank
effect: NoSchedule
value: prometheus
walCompression: true # https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
resources:
limits:
ephemeral-storage: 20Gi # So we die if we approach our on-host storage limit
memory: null
cpu: null
requests:
cpu: 300m
memory: 5Gi
Long term, we should understand the particular requirements of our prometheus deployment and set prometheus configuration accordingly.
Why am I doing this?
General awareness that the default configuration for the monitoring package doesn't prevent it from consuming all ephemeral pod storage, i.e. being a noisy neighbor.