bug in istio's implicit default helm values results in incorrect service deployment
Bug
Flux CLI Version
0.17.2
BigBang Version
1.17.0 is when I 1st noticed it so I downgraded to 1.15.3, but that's showing same issue.
Oct 5th I installed 1.15.3 and didn't see this issue, now Oct 11th I install BigBang 1.15.3 from scratch and the same code is giving a different result?
k get svc -n=istio-system
on Oct 5th installing 1.15.3 showed a service called public-ingressgateway (expected)
now Oct 11th I run the same code to install 1.15.3 and the service is called istio-ingressgateway (why???, and it breaks stuff as you'd expect.) (I'm accepting helm default values for istio atm)
Summarized Description of Bug
Istio Gateway CR's label selector's don't match the labels of provisioned service/pods.
Because k get service -n=istio-system shows a service named istio-ingressgateway, instead of public-ingressgateway.
The workaround fix I used was to explicitly use BB helm chart's default values, this seemed to happen because BB helm chart's implicit default values (which specify public-ingressgateway) seemed to be ignored in favor of the istio helm chart's implicit default values (which specify istio-ingressgateway). Why/how that can happen is something I'm not sure about.
These notes just go into more details.
Description
I updated the bb engineering cohort labs from 0.15.3 --> 0.17.0 and everything broke all at once / found 3 issue tickets. (So I used the customer template repo methodology, not the quickstart methodology)
Ingress is majorly broken with this update.
1st thing I tried was bumping version of BigBang w/o changing input values:
When I simply tried to keep my customer template repo input values that worked for 0.15.3, then bump the version from 0.15.3 --> 0.17.0 (but on a fresh install not an upgrade)
Ingress wasn't working due to a "Misdirected Request" error. #802 (closed) might be related?
2nd thing I tried was changing hostname: chrism.bigbang.dev
--> domain: chrism.bigbang.dev
:
Thought to do this after reading the release notes. Then retried a clean install. That resulted in me finding bug #856 (closed)
3rd thing I'm trying is using both the old and the new key:
domain: chrism.bigbang.dev
hostname: chrism.bigbang.dev
k get vs -A
NAMESPACE NAME GATEWAYS HOSTS AGE
jaeger jaeger [istio-system/public] [tracing.chrism.bigbang.dev] 11m
kiali kiali [istio-system/public] [kiali.chrism.bigbang.dev] 11m
logging kibana [istio-system/public] [kibana.chrism.bigbang.dev] 3h18m
monitoring monitoring-monitoring-kube-alertmanager [istio-system/public] [alertmanager.chrism.bigbang.dev] 14m
monitoring monitoring-monitoring-kube-grafana [istio-system/public] [grafana.chrism.bigbang.dev] 14m
monitoring monitoring-monitoring-kube-prometheus [istio-system/public] [prometheus.chrism.bigbang.dev] 14m
^-- This looks right at first but Ingress still doesn't work, when I try to visit webpages ERR_CONNECTION_REFUSED
(btw the method I'm using to visit webpages is one that works even when dns and lb aren't working)
kubectl port-forward svc/istio-ingressgateway -n=istio-system 4443:443
(edit /etc/hosts with temp entry "127.0.0.1 kibana.chrism.bigbang.dev")
curl https://kibana.chrism.bigbang.dev:4443
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to kibana.chrism.bigbang.dev:4443
and on the port forward I see a Connection refused error
E1010 04:13:56.434606 72212 portforward.go:400] an error occurred forwarding 4443 -> 8443: error forwarding port 8443 to pod 41c9a4eaf0eb478287ea258ba78f02bc84a97e93d494777d474b7363f9bbefd2, uid : failed to execute portforward in network namespace "/var/run/netns/cni-3bb05f41-7348-b802-8fc7-768ff6c2d380": socat command returns error: exit status 1, stderr: "2021/10/10 08:13:56 socat[402748] E connect(5, AF=2 127.0.0.1:8443, 16): Connection refused\n"
k get svc -n=istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.43.41.116 <pending> 15021:30387/TCP,80:31882/TCP,443:31659/TCP 3h19m
istiod ClusterIP 10.43.58.151 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3h19m
Well that's odd and annoying, because seems like needless arbitrary change that flips back and forth...
In an earlier version of BigBang
k get svc -n=istio-system
had a service named istio-ingressgateway
then it recently got renamed to public-ingressgateway
but now it's back at istio-ingressgateway
actually though that probably partially explains the error...
k get gateway -n=istio-system
NAME AGE
public 3h26m
k get gateway -n=istio-system public -o yaml
...
spec:
selector:
app: public-ingressgateway
...
^-- That selector looks wrong public vs istio
k get service -n=istio-system -L=app
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE APP
istio-ingressgateway LoadBalancer 10.43.41.116 <pending> 15021:30387/TCP,80:31882/TCP,443:31659/TCP 3h30m istio-ingressgateway
istiod ClusterIP 10.43.58.151 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3h31m istiod
k get pod -n=istio-system -L=app
NAME READY STATUS RESTARTS AGE APP
istio-ingressgateway-599c6bc5cb-qc6bk 1/1 Running 0 3h31m istio-ingressgateway
istiod-7489ff594d-q2jb4 1/1 Running 0 3h31m istiod
Yeap that plumbing doesn't look correct / not wired together right.
Just for shits and giggles
EDITOR="code --wait" kubectl edit gateway public -n=istio-system
I do an inplace replacement of selector public-ingressgateway to istio-ingressgateway ...Hum... that didn't work either... back to Misdirected Request
kubectl port-forward svc/istio-ingressgateway -n=istio-system 4443:443 (hostfile has an entry for "127.0.0.1 kibana.chrism.bigbang.dev"
curl https://kibana.chrism.bigbang.dev:4443
Misdirected Request
rereading through #802 (closed) Andrew mentions this issue occurs when using nonstandard ports, so I'll try standard ports.
sudo kubectl port-forward service/istio-ingressgateway -n=istio-system 443:443
curl https://grafana.chrism.bigbang.dev
upstream connect error or disconnect/reset before headers. reset reason: connection failure
That did in fact change the error I suppose, but still no joy just got a different error instead of fixing it, at this point I'll roll the bb engineering cohort back to 1.15.3, as 1.17.0 has too many oddities.
Also a Clarifying Note: on 1.15.3 the service shows up as public-ingressgateway, while 1.17.0 shows up as istio-ingressgateway, even though I used the same helm input values between the 2 versions and only bumped the version and did an environment reset then fresh install (not upgrade).