Pod Errors Leading to Non-Working State
Investigate Pod errors that are leading mimir to be in a non-working state:
Compactor error
│ ts=2024-11-20T22:23:02.901221095Z caller=log.go:245 level=info msg="Suspect monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 has failed, no acks received" │ │ ts=2024-11-20T22:23:37.900771021Z caller=log.go:245 level=info msg="Suspect monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 has failed, no acks received" │ │ ts=2024-11-20T22:23:45.954981842Z caller=log.go:245 level=info msg="Marking monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 as failed, suspect timeout reached (2 peer confirmations)" │ │ ts=2024-11-20T22:24:15.82368603Z caller=log.go:245 level=info msg="Marking monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 as failed, suspect timeout reached (2 peer confirmations)" │ │ ts=2024-11-20T22:24:47.900643881Z caller=log.go:245 level=info msg="Suspect monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 has failed, no acks received" │ │ ts=2024-11-20T22:25:42.900948195Z caller=log.go:245 level=info msg="Suspect monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 has failed, no acks received" │ │ ts=2024-11-20T22:26:42.900856125Z caller=log.go:245 level=info msg="Suspect monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 has failed, no acks received" │ │ ts=2024-11-20T22:26:46.762044216Z caller=log.go:245 level=info msg="Marking monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 as failed, suspect timeout reached (2 peer confirmations)" │ │ ts=2024-11-20T22:26:57.66067273Z caller=log.go:245 level=warn msg="Refuting a suspect message (from: monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9)" │ │ ts=2024-11-20T22:27:22.901618126Z caller=log.go:245 level=info msg="Suspect monitoring-mimir-query-frontend-5776c868d7-69tvr-f1395ca9 has failed, no acks received"
Query-frontend
│ ts=2024-11-20T22:27:51.740265828Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.3.13:9095 │ │ ts=2024-11-20T22:27:51.829987842Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.1.8:9095 │ │ ts=2024-11-20T22:27:52.141410185Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.1.8:9095 │ │ ts=2024-11-20T22:27:52.277888695Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.1.8:9095 │ │ ts=2024-11-20T22:27:52.500884865Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.1.8:9095 │ │ ts=2024-11-20T22:27:52.676696755Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.3.13:9095 │ │ ts=2024-11-20T22:27:52.854623291Z caller=frontend_scheduler_worker.go:326 level=error msg="error sending requests to scheduler" err="rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination" addr=10.42.1.8:9095