Kiali operator failure on dogfood cluster
Bug
Description
Describe the problem, what were you doing when you noticed the bug?
While working on bigbang dogfood cluster (RKE2), observed that kiali pod was still running the v1.82.0
image, but the kiali operator was updated as expected to v1.86.2
. Kiali
CRD was also reflecting the expected version.
Kiali deployment was deleted in hopes the operator would re-create with the appropriate image version. Unfortunately this was not the case.
Ran the below command to make sure the operator was picking up the update:
kubectl annotate kiali kiali -n kiali --overwrite kiali.io/reconcile="$(date)"
Looking at the kiali-operator pod logs, discovered the ansible was failing with a 404 on the Get api version information from the cluster
task in the kiali-deploy
playbook. Full logs:
TASK [default/kiali-deploy : Get api version information from the cluster] *****
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: raise ApiException(http_resp=r)
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 55, in inner\n resp = func(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 270, in request\n api_response = self.client.call_api(\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 348, in call_api\n return self.__call_api(resource_path, method,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api\n response_data = self.request(\n ^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 373, in request\n return self.rest_client.GET(url,\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 244, in GET\n return self.request(\"GET\", url,\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 238, in request\n raise ApiException(http_resp=r)\nkubernetes.client.exceptions.ApiException: (404)\nReason: Not Found\nHTTP response headers: HTTPHeaderDict({'Audit-Id': '1b455622-c163-441d-9af5-594413c79792', 'Cache-Control': 'no-cache, private', 'Content-Length': '174', 'Content-Type': 'application/json', 'Date': 'Thu, 11 Jul 2024 21:09:41 GMT', 'X-Kubernetes-Pf-Flowschema-Uid': '85c8d1de-4362-4854-b8cf-ca617c996883', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e0aa77c8-5534-4c18-8975-b09ae30d2c1a'})\nHTTP response body: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"the server could not find the requested resource\",\"reason\":\"NotFound\",\"details\":{},\"code\":404}\\n'\n\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/tmp/ansible/tmp/ansible-tmp-1720732176.7921784-499-48214681066386/AnsiballZ_k8s_cluster_info.py\", line 107, in <module>\n _ansiballz_main()\n File \"/tmp/ansible/tmp/ansible-tmp-1720732176.7921784-499-48214681066386/AnsiballZ_k8s_cluster_info.py\", line 99, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/tmp/ansible/tmp/ansible-tmp-1720732176.7921784-499-48214681066386/AnsiballZ_k8s_cluster_info.py\", line 47, in invoke_module\n runpy.run_module(mod_name='ansible_collections.kubernetes.core.plugins.modules.k8s_cluster_info', init_globals=dict(_module_fqn='ansible_collections.kubernetes.core.plugins.modules.k8s_cluster_info', _modlib_path=modlib_path),\n File \"<frozen runpy>\", line 226, in run_module\n File \"<frozen runpy>\", line 98, in _run_module_code\n File \"<frozen runpy>\", line 88, in _run_code\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_cluster_info.py\", line 213, in <module>\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_cluster_info.py\", line 209, in main\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_cluster_info.py\", line 166, in execute_module\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/discovery.py\", line 310, in __iter__\n rg.resources = self.get_resources_for_api_version(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/module_utils/client/discovery.py\", line 97, in get_resources_for_api_version\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 57, in inner\n raise api_exception(e)\nkubernetes.dynamic.exceptions.NotFoundError: 404\nReason: Not Found\nHTTP response headers: HTTPHeaderDict({'Audit-Id': '1b455622-c163-441d-9af5-594413c79792', 'Cache-Control': 'no-cache, private', 'Content-Length': '174', 'Content-Type': 'application/json', 'Date': 'Thu, 11 Jul 2024 21:09:41 GMT', 'X-Kubernetes-Pf-Flowschema-Uid': '85c8d1de-4362-4854-b8cf-ca617c996883', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e0aa77c8-5534-4c18-8975-b09ae30d2c1a'})\nHTTP response body: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"the server could not find the requested resource\",\"reason\":\"NotFound\",\"details\":{},\"code\":404}\\n'\nOriginal traceback: \n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 55, in inner\n resp = func(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 270, in request\n api_response = self.client.call_api(\n ^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 348, in call_api\n return self.__call_api(resource_path, method,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api\n response_data = self.request(\n ^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 373, in request\n return self.rest_client.GET(url,\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 244, in GET\n return self.request(\"GET\", url,\n ^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 238, in request\n raise ApiException(http_resp=r)\n\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
@michaelmartin was able to narrow the failing call down to:
https://internal-dogfood2-x6u-rke2-cp-1157307456.us-gov-west-1.elb.amazonaws.com:6443/apis/flowcontrol.apiserver.k8s.io/v1beta2
The flowcontrol.apiserver.k8s.io/v1beta2
api version was removed as part of 1.29:
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#flowcontrol-resources-v129
That api version does not appear to be in the ansible/python code and is not a valid api version on the cluster, so not sure where it is coming from:
kubectl api-versions | grep flow
flowcontrol.apiserver.k8s.io/v1
flowcontrol.apiserver.k8s.io/v1beta3
@michaelmartin also attempted a fresh deploy of 1.86.2-bb.0
along with several other versions going all the way back to 1.78.0-bb.5
, all with the same failure. Deploys and upgrades on a k3d dev cluster work without issue.
Michael note:
This error does not manifest on k3d or vanilla k8s.
This can be re-produced on a local machine by running the following ansible task (with python3-kubernetes also installed) and pointing to the dogfood cluster.
- hosts: localhost
gather_facts: false
connection: local
tasks:
- name: Get api version information from the cluster
kubernetes.core.k8s_cluster_info:
register: api_status
The error comes up, and hacking the python code and re-running that ansible task, I was able to determine the call that was failing. The issue appears to be the rke2 cluster is somehow returning that now-removed api be queried (I believe the python code dynamically queries the apis that are presented by the cluster)
BigBang Version
What version of BigBang were you running?
release-2.31.x