Confluence Hazelcast Startup Issues
Summary
Confluence unable to start up with Hazelcast and will not cluster properly (of type Kubernetes). Causes unhealthy cluster and unable to re-index successfully. When deployed as Synchrony, the pod will forcefully terminate with the errors below. We believe the SSL errors occur on both Synchrony and Confluence workloads (but we cannot confirm on the Confluence pods).
Steps to reproduce
Deploy with clustering and synchrony enabled using BigBang's latest 3rd party chart: https://repo1.dso.mil/big-bang/apps/third-party/confluence
What is the current bug behavior?
Confluence nodes will come up healthy but unable to cluster. Synchrony workloads will be under CrashLoopBackOff.
What is the expected correct behavior?
Hazelcast correctly connects on both Synchrony and Confluence node workloads.
Relevant logs and/or screenshots
2023-02-09 17:35:42,580 ERROR [main] [internal.cluster.impl.DiscoveryJoiner] []:5701 [Confluence-Synchrony] [3.12.11] Failure in generating SSLSocketFactory
com.hazelcast.kubernetes.KubernetesClientException: Failure in generating SSLSocketFactory
...
Caused by: java.security.KeyStoreException: sun.security.pkcs11.wrapper.PKCS11Exception: CKR_SESSION_READ_ONLY
at jdk.crypto.cryptoki/sun.security.pkcs11.P11KeyStore.engineSetEntry(P11KeyStore.java:1049)
at jdk.crypto.cryptoki/sun.security.pkcs11.P11KeyStore.engineSetCertificateEntry(P11KeyStore.java:515)
at java.base/java.security.KeyStore.setCertificateEntry(KeyStore.java:1235)
at com.hazelcast.kubernetes.RestClient.buildSslSocketFactory(RestClient.java:183)
... 39 more
Caused by: sun.security.pkcs11.wrapper.PKCS11Exception: CKR_SESSION_READ_ONLY
...
2023-02-09 17:35:42,581 ERROR [main] [com.hazelcast.instance.Node] []:5701 [Confluence-Synchrony] [3.12.11] Could not join cluster. Shutting down now!
Possible fixes
Unsure how Hazelcast is configured but verification that Hazelcast can connect to the K8s API without any SSL issues.
Tasks
-
Bug has been identified and corrected within the container