Several packages (especially operators) enter crash loop on first and sometimes second container start
Bug
Description
During flux deployment of Big Bang packages, several of them will crash their first time with a common error when attempting to connect to the Kube API. After the crash loop backoff, a small number will crash again. Eventually, the containers will come up successfully after restarting.
Deploy Big Bang will all packages and observe the pods for restarts. If a pod fails, grab the previous logs of the main container to get the error message.
The following are the packages that I have observed with failures connecting to the API:
- Kiali Operator
- Jaeger Operator
- Velero
- Cluster Auditor
ECK Operator and Prometheus have errors that may or may not be related to the inability to connect to the API.
Acceptance Criteria
Solution will either resolve issue or document issue (error messages)
BigBang Version
1.13
Logs
Jaeger Operator
time="2021-07-27T02:43:58Z" level=info msg=Versions arch=amd64 identity=jaeger.jaeger-jaeger-jaeger-operator jaeger=1.23.0 jaeger-operator=release/v1.23.0-1-g2241b8a5 operator-sd k=v0.18.2 os=linux version=go1.16.5
time="2021-07-27T02:43:58Z" level=fatal msg="Get \"https://10.43.0.1:443/api?timeout=32s\": dial tcp 10.43.0.1:443: connect: connection refused"
Velero
time="2021-07-27T02:42:22Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
An error occurred: Get "https://10.43.0.1:443/api?timeout=32s": dial tcp 10.43.0.1:443: connect: connection refused
Cluster Auditor
/usr/local/lib/ruby/2.6.0/net/http.rb:947:in `initialize': Connection refused - connect(2) for "10.43.0.1" port 443 (Errno::ECONNREFUSED)
from /usr/local/lib/ruby/2.6.0/net/http.rb:947:in `open'
from /usr/local/lib/ruby/2.6.0/net/http.rb:947:in `block in connect'
from /usr/local/lib/ruby/2.6.0/timeout.rb:93:in `block in timeout'
from /usr/local/lib/ruby/2.6.0/timeout.rb:103:in `timeout'
from /usr/local/lib/ruby/2.6.0/net/http.rb:945:in `connect'
from /usr/local/lib/ruby/2.6.0/net/http.rb:930:in `do_start'
from /usr/local/lib/ruby/2.6.0/net/http.rb:919:in `start'
from /usr/local/bundle/gems/rest-client-2.1.0/lib/restclient/request.rb:727:in `transmit'
from /usr/local/bundle/gems/rest-client-2.1.0/lib/restclient/request.rb:163:in `execute'
from /usr/local/bundle/gems/rest-client-2.1.0/lib/restclient/request.rb:63:in `execute'
from /usr/local/bundle/gems/rest-client-2.1.0/lib/restclient/resource.rb:51:in `get'
from /usr/local/bundle/gems/kubeclient-4.6.0/lib/kubeclient/common.rb:493:in `block in api'
from /usr/local/bundle/gems/kubeclient-4.6.0/lib/kubeclient/common.rb:121:in `handle_exception'
from /usr/local/bundle/gems/kubeclient-4.6.0/lib/kubeclient/common.rb:493:in `api'
from /usr/local/bundle/gems/kubeclient-4.6.0/lib/kubeclient/common.rb:486:in `api_valid?'
from /usr/local/bundle/gems/fluent-plugin-kubernetes-objects-1.1.4/lib/fluent/plugin/in_kubernetes_objects.rb:179:in `initialize_client'
from /usr/local/bundle/gems/fluent-plugin-kubernetes-objects-1.1.4/lib/fluent/plugin/in_kubernetes_objects.rb:95:in `configure'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/plugin.rb:178:in `configure'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:317:in `add_source'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:158:in `block in configure'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:152:in `each'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/root_agent.rb:152:in `configure'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/engine.rb:105:in `configure'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/engine.rb:80:in `run_configure'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/supervisor.rb:648:in `run_supervisor'
from /usr/local/bundle/gems/fluentd-1.12.2/lib/fluent/command/fluentd.rb:345:in `<top (required)>'
from /usr/local/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /usr/local/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
from /usr/local/bundle/gems/fluentd-1.12.2/bin/fluentd:8:in `<top (required)>'
from /usr/local/bundle/bin/fluentd:23:in `load'
from /usr/local/bundle/bin/fluentd:23:in `<main>'
Kiali Operator
{"level":"info","ts":1627353920.5781307,"logger":"cmd","msg":"Version","Go Version":"go1.15.10","GOOS":"linux","GOARCH":"amd64","ansible-operator":"v1.4.0+git","commit":"98f30d59ade2d911a7a8c76f0169a7de0dec37a0"}
{"level":"info","ts":1627353920.578408,"logger":"cmd","msg":"Watching all namespaces.","N amespace":""}
{"level":"error","ts":1627353920.5789082,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"Get \"https://10.43.0.1:443/api?timeout=32s\": dial tcp 10.43.0.1:443: connect: connection refused","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/cluster.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.2/pkg/cluster/cluster.go:159\nsigs.k8s.io/controller-runtime/pkg/manager.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.2/pkg/manager/manager.go:278\ngithub.com/operator-f ramework/operator-sdk/internal/cmd/ansible-operator/run.run\n\t/workspace/internal/cmd/ansible-operator/run/cmd.go:143\ngithub.com/operator-framework/operator-sdk/internal/cmd/ansible-operator/run.NewCmd.func1\n\t/workspace/internal/cmd/ansible-operator/run/cmd.go:71 \ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:854\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:958\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/g ithub.com/spf13/cobra@v1.1.1/command.go:895\nmain.main\n\t/workspace/cmd/ansible-operator/main.go:40\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}
{"level":"error","ts":1627353920.5789804,"logger":"cmd","msg":"Failed to create a new manager.","Namespace":"","error":"Get \"https://10.43.0.1:443/api?timeout=32s\": dial tcp 10.43.0.1:443: connect: connection refused","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\ngithub.com/operator- framework/operator-sdk/internal/cmd/ansible-operator/run.run\n\t/workspace/internal/cmd/ansible-operator/run/cmd.go:145\ngithub.com/operator-framework/operator-sdk/internal/cmd/ansible-operator/run.NewCmd.func1\n\t/workspace/internal/cmd/ansible-operator/run/cmd.go:7 1\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:854\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.1/command.go:958\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/ github.com/spf13/cobra@v1.1.1/command.go:895\nmain.main\n\t/workspace/cmd/ansible-operator/main.go:40\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}