UNCLASSIFIED - NO CUI

Skip to content

Grafana Poor Performance on m5a.4xlarge

When running a m5a.4xlarge EC2 (AMD based); Grafana underperforms and the kiwigrid/k8s-sidecar (grafana-sc-dashboard) takes several hours to load all the dashboards from configmaps.

However when running a smaller t3.2xlarge EC2 (Intel based); Grafana performs as normal and dashboards appear get loaded only a few minutes.

# m5a.4xlarge

# First Dashboard Loaded 18:42:01
{"time": "2025-06-13T18:42:01.001005+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/policy-reporter-details-dashboard.json (ascii)"}                                                                                                            

# Dashboards still not finished loading 30 minutes later 
"2025-06-13T19:10:37.969118+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/pod-total.json 

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/dashboards/reload (Caused by ReadTimeoutError("HTTPConnectionPool(host='localhost', port=3000): Read timed out {"time": "2025-06-13T19:10:37.969118+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/pod-total.json (ascii)"}
# t3.2xlarge

# First Dashboard Loaded 19:19:14
{"time": "2025-06-11T19:19:14.736844+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/policy-reporter-details-dashboard.json (ascii)"} 

# Last Dashboard Loaded 19:22:02 ~ 3 minutes later
"time": "2025-06-11T19:22:02.036481+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/prometheus-remote-write.json (ascii)"} 

The underlying cause is currently unknown, but the problem is repeatable. This issue will be used to track any findings and related information.

Edited by Brian Jackson