Kiali operator failure on dogfood cluster
# Bug
## Description
Describe the problem, what were you doing when you noticed the bug?
While working on bigbang dogfood cluster (RKE2), observed that kiali pod was still running the `v1.82.0` image, but the kiali operator was updated as expected to `v1.86.2`. `Kiali` CRD was also reflecting the expected version.
Kiali deployment was deleted in hopes the operator would re-create with the appropriate image version. Unfortunately this was not the case.
Ran the below command to make sure the operator was picking up the update:
```
kubectl annotate kiali kiali -n kiali --overwrite kiali.io/reconcile="$(date)"
```
Looking at the kiali-operator pod logs, discovered the ansible was failing with a 404 on the `Get api version information from the cluster` task in the `kiali-deploy` playbook. Full logs:
```
TASK [default/kiali-deploy : Get api version information from the cluster] *****
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: raise ApiException(http_resp=r)
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 55, in inner\n resp = func(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 270, in request\n api_response = self.client.call_api(\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 348, in call_api\n return self.__call_api(resource_path, method,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api\n response_data = self.request(\n ^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 373, in request\n return self.rest_client.GET(url,\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 244, in GET\n return self.request(\"GET\", url,\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 238, in request\n raise ApiException(http_resp=r)\nkubernetes.client.exceptions.ApiException: (404)\nReason: Not Found\nHTTP response headers: HTTPHeaderDict({'Audit-Id': '1b455622-c163-441d-9af5-594413c79792', 'Cache-Control': 'no-cache, private', 'Content-Length': '174', 'Content-Type': 'application/json', 'Date': 'Thu, 11 Jul 2024 21:09:41 GMT', 'X-Kubernetes-Pf-Flowschema-Uid': '85c8d1de-4362-4854-b8cf-ca617c996883', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e0aa77c8-5534-4c18-8975-b09ae30d2c1a'})\nHTTP response body: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"the server could not find the requested resource\",\"reason\":\"NotFound\",\"details\":{},\"code\":404}\\n'\n\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/tmp/ansible/tmp/ansible-tmp-1720732176.7921784-499-48214681066386/AnsiballZ_k8s_cluster_info.py\", line 107, in <module>\n _ansiballz_main()\n File \"/tmp/ansible/tmp/ansible-tmp-1720732176.7921784-499-48214681066386/AnsiballZ_k8s_cluster_info.py\", line 99, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/tmp/ansible/tmp/ansible-tmp-1720732176.7921784-499-48214681066386/AnsiballZ_k8s_cluster_info.py\", line 47, in invoke_module\n runpy.run_module(mod_name='ansible_collections.kubernetes.core.plugins.modules.k8s_cluster_info', init_globals=dict(_module_fqn='ansible_collections.kubernetes.core.plugins.modules.k8s_cluster_info', _modlib_path=modlib_path),\n File \"<frozen runpy>\", line 226, in run_module\n File \"<frozen runpy>\", line 98, in _run_module_code\n File \"<frozen runpy>\", line 88, in _run_code\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_cluster_info.py\", line 213, in <module>\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_cluster_info.py\", line 209, in main\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_cluster_info.py\", line 166, in execute_module\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/discovery.py\", line 310, in __iter__\n rg.resources = self.get_resources_for_api_version(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/tmp/ansible_k8s_cluster_info_payload_fgkezs89/ansible_k8s_cluster_info_payload.zip/ansible_collections/kubernetes/core/plugins/module_utils/client/discovery.py\", line 97, in get_resources_for_api_version\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 57, in inner\n raise api_exception(e)\nkubernetes.dynamic.exceptions.NotFoundError: 404\nReason: Not Found\nHTTP response headers: HTTPHeaderDict({'Audit-Id': '1b455622-c163-441d-9af5-594413c79792', 'Cache-Control': 'no-cache, private', 'Content-Length': '174', 'Content-Type': 'application/json', 'Date': 'Thu, 11 Jul 2024 21:09:41 GMT', 'X-Kubernetes-Pf-Flowschema-Uid': '85c8d1de-4362-4854-b8cf-ca617c996883', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e0aa77c8-5534-4c18-8975-b09ae30d2c1a'})\nHTTP response body: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"the server could not find the requested resource\",\"reason\":\"NotFound\",\"details\":{},\"code\":404}\\n'\nOriginal traceback: \n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 55, in inner\n resp = func(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/dynamic/client.py\", line 270, in request\n api_response = self.client.call_api(\n ^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 348, in call_api\n return self.__call_api(resource_path, method,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 180, in __call_api\n response_data = self.request(\n ^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/api_client.py\", line 373, in request\n return self.rest_client.GET(url,\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 244, in GET\n return self.request(\"GET\", url,\n ^^^^^^^^^^^^^^^^^^^^^^^^\n\n File \"/opt/ansible/python-env/lib64/python3.11/site-packages/kubernetes/client/rest.py\", line 238, in request\n raise ApiException(http_resp=r)\n\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
```
@michaelmartin was able to narrow the failing call down to:
```
https://internal-dogfood2-x6u-rke2-cp-1157307456.us-gov-west-1.elb.amazonaws.com:6443/apis/flowcontrol.apiserver.k8s.io/v1beta2
```
The `flowcontrol.apiserver.k8s.io/v1beta2` api version was removed as part of 1.29:
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#flowcontrol-resources-v129
That api version does not appear to be in the ansible/python code and is not a valid api version on the cluster, so not sure where it is coming from:
```
kubectl api-versions | grep flow
flowcontrol.apiserver.k8s.io/v1
flowcontrol.apiserver.k8s.io/v1beta3
```
@michaelmartin also attempted a fresh deploy of `1.86.2-bb.0` along with several other versions going all the way back to `1.78.0-bb.5`, all with the same failure. Deploys and upgrades on a k3d dev cluster work without issue.
Michael note:
This error does not manifest on k3d or vanilla k8s.
This can be re-produced on a local machine by running the following ansible task (with python3-kubernetes also installed) and pointing to the dogfood cluster.
```yaml
- hosts: localhost
gather_facts: false
connection: local
tasks:
- name: Get api version information from the cluster
kubernetes.core.k8s_cluster_info:
register: api_status
```
The error comes up, and hacking the python code and re-running that ansible task, I was able to determine the call that was failing. The issue appears to be the rke2 cluster is somehow returning that now-removed api be queried (I believe the python code dynamically queries the apis that are presented by the cluster)
## BigBang Version
What version of BigBang were you running?
release-2.31.x
issue