SSO login error for Grafana and Kibana when using custom CA
Bug
Description
When global SSO is enabled and a custom CA certificate is provided in values.yaml, sso.certificateAuthority
, I encounter an x509 error from both Grafana and Kibana after the login redirect from Keycloak back to these components. More details below, but I'm creating the issue at the umbrella level since this issue affects two components. From the same BigBang deployment, I can SSO login successfully to ArgoCD and Kiali.
Use case: We are testing with a local CA and signed wildcard certs to make sure our project's scripting has not accidently hardcoded any DNS values. In addition, we will be deploying BigBang to high-side environments where DoD issued certs would not be available in the base OS for K8s worker nodes, so we're making sure that we are not implicitly trusting any OS-level CAs during our dev testing.
Steps to Reproduce
Use openssl to generate a custom CA key and crt. Create and sign a csr for a wildcard crt, "*.dev-int.proj.org", used for private ingress for Grafana, Kibana, ArgoCD, Kiali. Create and sign a csr for a wildcard crt, "*.dev.proj.org", used for passthrough ingress for Keycloak.
openssl genrsa -des3 -out ProjCA.key 3072
openssl req -x509 -new -nodes -key ProjCA.key -sha256 -days 365 -out ProjCA.crt
openssl genrsa -out STAR_dev_proj_org.key 3072
openssl req -new -key STAR_dev_proj_org.key -extensions v3_ca -out STAR_dev_proj_org.csr
openssl x509 -req -in STAR_dev_proj_org.csr -CA ProjCA.crt -CAkey ProjCA.key -CAcreateserial -extfile openssl.dev.cnf -out STAR_dev_proj_org.crt -days 365 -sha256
In the BigBang values.yaml, enable SSO for components and config Keycloak ingress gateway, client_ids, client_secrets, etc. The global "sso" section contains:
sso:
name: Proj
url: https://keycloak.dev.proj.org/auth/realms/proj
certificateAuthority:
cert: |
###PROJ_SSO_CERT### <-- multi-line pem content from ProjCA.crt
secretName: tls-ca-sso
Register a new Keycloak user, with correct group, etc., and initiate a SSO login from the Grafana UI. Redirect to Keycloak, user creds are accepted, MFA is accepted, consent form is accepted, OIDC attributes are accepted, after which the error appears on the Grafana login screen - "Login failed, Failed to get token from provider". The corresponding error in the pod log is:
logger=authn.service t=2024-11-22T22:38:04.716071363Z level=error msg="Failed to authenticate request" client=auth.client.generic_oauth error="\[auth.oauth.token.exchange\] failed to exchange code to token: Post "https://keycloak.dev.proj.org/auth/realms/proj/protocol/openid-connect/token": tls: failed to verify certificate: x509: certificate signed by unknown authority"
Digging and additional info
I did see that the "tls-ca-sso" secret was created in all "core" component namespaces and that the "ca.pem" string wa correct (and as noted, worked as expected for ArgoCD and Kiali logins). In the monitoring namespace there is also an additional secret "tls-ca-sso-grafana" with the same content as the "tls-ca-sso" secret. I also saw that the secret was volume mounted to the grafana pod (using "extraSecretMounts" I think). I did not see any corresponding sections for a SSO CA crt in the deployed grafana ini under the [auth.generic_oauth]
section.
For completeness, we are using AWS EKS clusters and RHEL8 as the worker node AMI. I also tried with default AL2 worker node AMI with the same issue. I don't think EKS or RHEL should impact the sso.certificateAuthority
functionality. Using a commercial CA and signed certs works as expected when omitting the BigBang sso.certificateAuthority
values.
BigBang Version
BigBang version 2.30.0