Grafana Poor Performance on m5a.4xlarge

When running a m5a.4xlarge EC2 (AMD based); Grafana underperforms and the kiwigrid/k8s-sidecar (grafana-sc-dashboard) takes several hours to load all the dashboards from configmaps.

However when running a smaller t3.2xlarge EC2 (Intel based); Grafana performs as normal and dashboards appear get loaded only a few minutes.

# m5a.4xlarge

# First Dashboard Loaded 18:42:01
{"time": "2025-06-13T18:42:01.001005+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/policy-reporter-details-dashboard.json (ascii)"}                                                                                                            

# Dashboards still not finished loading 30 minutes later 
"2025-06-13T19:10:37.969118+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/pod-total.json 

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=3000): Max retries exceeded with url: /api/admin/provisioning/dashboards/reload (Caused by ReadTimeoutError("HTTPConnectionPool(host='localhost', port=3000): Read timed out {"time": "2025-06-13T19:10:37.969118+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/pod-total.json (ascii)"}

# t3.2xlarge

# First Dashboard Loaded 19:19:14
{"time": "2025-06-11T19:19:14.736844+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/policy-reporter-details-dashboard.json (ascii)"} 

# Last Dashboard Loaded 19:22:02 ~ 3 minutes later
"time": "2025-06-11T19:22:02.036481+00:00", "level": "INFO", "msg": "Writing /tmp/dashboards/prometheus-remote-write.json (ascii)"}

The underlying cause is currently unknown, but the problem is repeatable. This issue will be used to track any findings and related information.

Edited Jun 13, 2025 by Brian Jackson

Admin message

Grafana Poor Performance on m5a.4xlarge