psp gatekeeper breaking rke2 restart
After shutting down our 6 nodes cluster running on EC2, and restarting them, the 3 master are not 'ready'. We get error below. It seems that it's the result of "several potential pitfalls in the OPA Gatekeeper deployment model.". We run RKE rke2 v1.20.6+rke2r1, it may be fixed in v1.21.1+rke2r1, will update this issue after testing further.
Aug 02 16:21:49 ip-172-31-46-213.us-gov-east-1.compute.internal rke2[25226]: time="2021-08-02T16:21:49Z" level=fatal msg="psp: update namespace: kube-system - Internal error occurred: failed calling webhook \"check-ignore-label.gatekeeper.sh\": Post \"https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s\": context deadline exceeded"
Aug 02 16:21:49 ip-172-31-46-213.us-gov-east-1.compute.internal systemd[1]: rke2-server.service: main process exited, code=exited, status=1/FAILURE
Aug 02 16:21:49 ip-172-31-46-213.us-gov-east-1.compute.internal systemd[1]: Unit rke2-server.service entered failed state.
Aug 02 16:21:49 ip-172-31-46-213.us-gov-east-1.compute.internal systemd[1]: rke2-server.service failed.
Seem related to https://github.com/rancher/rke2/issues/1054