Manually walk through bigbang release process using the new SIL EKS release cluster
Who
@danilo-patrucco and @dpritchettrm walking through the usual release process except on the new SIL cluster
Progress thus far
- identify testers
-
confirm both testers can access the cluster with
sshuttle
etc.
TODO
- replicate the entire manual release checklist from the link below
References
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Daniel Pritchett added teamTools & Automation label
added teamTools & Automation label
- Daniel Pritchett set weight to 5
set weight to 5
- Daniel Pritchett assigned to @danilo-patrucco and @dpritchettrm
assigned to @danilo-patrucco and @dpritchettrm
- Daniel Pritchett added statusdoing label
added statusdoing label
- Daniel Pritchett changed the description
Compare with previous version changed the description
- **** added triage-kind label
added triage-kind label
- **** added triage-priority label
added triage-priority label
- Developer
Notes on UI tests
Kibana:
- Need to change the URL in the link (release instead of dogfood)
Kiali:
- Need to change the URL in the links (all the pages need to be reviewed)
Mattermost in Kiali:
LOGS of failure ################################################################################################################################################ {"timestamp":"2025-02-11 15:45:29.288 Z","level":"debug","msg":"Incoming webhook received","caller":"web/webhook.go:57","webhook_id":"16tqi9ucpbyhfey37zqbgmqcaa","request_id":"k46wdg5yh7ga5g96138d79y7be","payload":"{"text":"","username":"Alertmanager","icon_url":"","channel":"","props":null,"attachments":[{"id":0,"fallback":"[FIRING:1] PrometheusDuplicateTimestamps (logging-loki prometheus http-web 192.168.139.238:9090 monitoring-monitoring-kube-prometheus monitoring prometheus-monitoring-monitoring-kube-prometheus-0 monitoring/monitoring-monitoring-kube-prometheus monitoring-monitoring-kube-prometheus warning) | https://alertmanager.release.bigbang.mil/#/alerts?receiver=mattermost-notifications\",\"color\":\"danger\",\"pretext\":\"\",\"author_name\":\"\",\"author_link\":\"\",\"author_icon\":\"\",\"title\":\"[FIRING:1] PrometheusDuplicateTimestamps (logging-loki prometheus http-web 192.168.139.238:9090 monitoring-monitoring-kube-prometheus monitoring prometheus-monitoring-monitoring-kube-prometheus-0 monitoring/monitoring-monitoring-kube-prometheus monitoring-monitoring-kube-prometheus warning)","title_link":"https://alertmanager.release.bigbang.mil/#/alerts?receiver=mattermost-notifications\",\"text\":\"\",\"fields\":null,\"image_url\":\"\",\"thumb_url\":\"\",\"footer\":\"\",\"footer_icon\":\"\",\"ts\":null}],\"type\":\"\",\"icon_emoji\":\"\",\"priority\":null}"} {"timestamp":"2025-02-11 15:45:29.288 Z","level":"debug","msg":"Incoming webhook received","caller":"web/webhook.go:57","webhook_id":"16tqi9ucpbyhfey37zqbgmqcaa","request_id":"ohmxjoubp7g5mxtxt54uis3har","payload":"{"text":"","username":"Alertmanager","icon_url":"","channel":"","props":null,"attachments":[{"id":0,"fallback":"[FIRING:1] PrometheusRuleFailures (logging-loki prometheus http-web 192.168.139.238:9090 monitoring-monitoring-kube-prometheus monitoring prometheus-monitoring-monitoring-kube-prometheus-0 monitoring/monitoring-monitoring-kube-prometheus /etc/prometheus/rules/prometheus-monitoring-monitoring-kube-prometheus-rulefiles-0/monitoring-monitoring-monitoring-kube-kube-prometheus-general.rules-8d8d74bd-2d7a-4a98-b3b0-191505559e1b.yaml;kube-prometheus-general.rules monitoring-monitoring-kube-prometheus critical) | https://alertmanager.release.bigbang.mil/#/alerts?receiver=mattermost-notifications\",\"color\":\"danger\",\"pretext\":\"\",\"author_name\":\"\",\"author_link\":\"\",\"author_icon\":\"\",\"title\":\"[FIRING:1] PrometheusRuleFailures (logging-loki prometheus http-web 192.168.139.238:9090 monitoring-monitoring-kube-prometheus monitoring prometheus-monitoring-monitoring-kube-prometheus-0 monitoring/monitoring-monitoring-kube-prometheus /etc/prometheus/rules/prometheus-monitoring-monitoring-kube-prometheus-rulefiles-0/monitoring-monitoring-monitoring-kube-kube-prometheus-general.rules-8d8d74bd-2d7a-4a98-b3b0-191505559e1b.yaml;kube-prometheus-general.rules monitoring-monitoring-kube-prometheus critical)","title_link":"https://alertmanager.release.bigbang.mil/#/alerts?receiver=mattermost-notifications\",\"text\":\"\",\"fields\":null,\"image_url\":\"\",\"thumb_url\":\"\",\"footer\":\"\",\"footer_icon\":\"\",\"ts\":null}],\"type\":\"\",\"icon_emoji\":\"\",\"priority\":null}"} {"timestamp":"2025-02-11 15:45:29.288 Z","level":"debug","msg":"Failed to handle the payload of media type application/json for incoming webhook 16tqi9ucpbyhfey37zqbgmqcaa.","caller":"web/context.go:120","path":"/hooks/16tqi9ucpbyhfey37zqbgmqcaa","request_id":"k46wdg5yh7ga5g96138d79y7be","ip_addr":"127.0.0.6","user_id":"","method":"POST","err_where":"incomingWebhook","http_code":400,"error":"incomingWebhook: Failed to handle the payload of media type application/json for incoming webhook 16tqi9ucpbyhfey37zqbgmqcaa., HandleIncomingWebhook: Invalid webhook., resource "IncomingWebhook" not found, id: 16tqi9ucpbyhfey37zqbgmqcaa"} {"timestamp":"2025-02-11 15:45:29.288 Z","level":"debug","msg":"Failed to handle the payload of media type application/json for incoming webhook 16tqi9ucpbyhfey37zqbgmqcaa.","caller":"web/context.go:120","path":"/hooks/16tqi9ucpbyhfey37zqbgmqcaa","request_id":"ohmxjoubp7g5mxtxt54uis3har","ip_addr":"127.0.0.6","user_id":"","method":"POST","err_where":"incomingWebhook","http_code":400,"error":"incomingWebhook: Failed to handle the payload of media type application/json for incoming webhook 16tqi9ucpbyhfey37zqbgmqcaa., HandleIncomingWebhook: Invalid webhook., resource "IncomingWebhook" not found, id: 16tqi9ucpbyhfey37zqbgmqcaa"} {"timestamp":"2025-02-11 15:45:29.288 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"POST","url":"/hooks/16tqi9ucpbyhfey37zqbgmqcaa","request_id":"k46wdg5yh7ga5g96138d79y7be","user_id":"","status_code":"400"} {"timestamp":"2025-02-11 15:45:29.288 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"POST","url":"/hooks/16tqi9ucpbyhfey37zqbgmqcaa","request_id":"ohmxjoubp7g5mxtxt54uis3har","user_id":"","status_code":"400"} {"timestamp":"2025-02-11 15:45:31.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"tukesnwrt384zyz3rpwn667rqh","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:45:36.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"7mkojb63ppn63pyabgw79f6gdc","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:45:36.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"97k6p4ku1pyk7k4xt79g8y59fa","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:45:40.289 Z","level":"debug","msg":"Incoming webhook received","caller":"web/webhook.go:57","webhook_id":"16tqi9ucpbyhfey37zqbgmqcaa","request_id":"148f5qty63drtpabw43sth4okr","payload":"{"text":"","username":"Alertmanager","icon_url":"","channel":"","props":null,"attachments":[{"id":0,"fallback":"[FIRING:1] AlertmanagerFailedToSendAlerts (logging-loki alertmanager http-web 192.168.136.129:9093 slack monitoring-monitoring-kube-alertmanager monitoring alertmanager-monitoring-monitoring-kube-alertmanager-0 monitoring/monitoring-monitoring-kube-prometheus clientError monitoring-monitoring-kube-alertmanager warning) | https://alertmanager.release.bigbang.mil/#/alerts?receiver=mattermost-notifications\",\"color\":\"danger\",\"pretext\":\"\",\"author_name\":\"\",\"author_link\":\"\",\"author_icon\":\"\",\"title\":\"[FIRING:1] AlertmanagerFailedToSendAlerts (logging-loki alertmanager http-web 192.168.136.129:9093 slack monitoring-monitoring-kube-alertmanager monitoring alertmanager-monitoring-monitoring-kube-alertmanager-0 monitoring/monitoring-monitoring-kube-prometheus clientError monitoring-monitoring-kube-alertmanager warning)","title_link":"https://alertmanager.release.bigbang.mil/#/alerts?receiver=mattermost-notifications\",\"text\":\"\",\"fields\":null,\"image_url\":\"\",\"thumb_url\":\"\",\"footer\":\"\",\"footer_icon\":\"\",\"ts\":null}],\"type\":\"\",\"icon_emoji\":\"\",\"priority\":null}"} {"timestamp":"2025-02-11 15:45:40.289 Z","level":"debug","msg":"Failed to handle the payload of media type application/json for incoming webhook 16tqi9ucpbyhfey37zqbgmqcaa.","caller":"web/context.go:120","path":"/hooks/16tqi9ucpbyhfey37zqbgmqcaa","request_id":"148f5qty63drtpabw43sth4okr","ip_addr":"127.0.0.6","user_id":"","method":"POST","err_where":"incomingWebhook","http_code":400,"error":"incomingWebhook: Failed to handle the payload of media type application/json for incoming webhook 16tqi9ucpbyhfey37zqbgmqcaa., HandleIncomingWebhook: Invalid webhook., resource "IncomingWebhook" not found, id: 16tqi9ucpbyhfey37zqbgmqcaa"} {"timestamp":"2025-02-11 15:45:40.289 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"POST","url":"/hooks/16tqi9ucpbyhfey37zqbgmqcaa","request_id":"148f5qty63drtpabw43sth4okr","user_id":"","status_code":"400"} {"timestamp":"2025-02-11 15:45:41.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"tzx3oq6i5fgwzpyjii45zj6tor","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:45:46.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"qgz9hxbujinwucmonhdr4i9bkw","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:45:51.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"3bgjphfzcpdnbjm5ht4u1ngg9w","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:45:56.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"gt3jaw6ecpy7pxwd1b71ogfjfe","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:01.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"s65dkzsptjrzm8xyu9qix6mkma","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:02.518 Z","level":"debug","msg":"Cleaning up token store.","caller":"app/server.go:1260"} {"timestamp":"2025-02-11 15:46:02.520 Z","level":"debug","msg":"Checking for security update from Mattermost","caller":"app/security_update_check.go:49"} {"timestamp":"2025-02-11 15:46:06.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"tz7kmcun7bb9pjwxmewrm3rsiy","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:11.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"er7guys9dprkpyqmfo65t45ycw","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:16.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"do6mj335q7rtuqfeobcwd3ra8y","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:21.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"1z1m15tbqfro7yqeh9kc5trg3a","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:26.045 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"wq1rxhyk3tre7n1u9w34a8erwa","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:31.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"i5raii9yrfbc9yco9ak7w9sqiy","user_id":"","status_code":"200"} {"timestamp":"2025-02-11 15:46:36.046 Z","level":"debug","msg":"Received HTTP request","caller":"web/handlers.go:185","method":"GET","url":"/api/v4/system/ping","request_id":"dccqocxbwtd3bki9t8kyki7wfr","user_id":"","status_code":"200"} ################################################################################################################################################
Twistlock :
- Need to fix urls
Fortify:
- Need to fix url
- repository is : https://repo1.dso.mil/big-bang/team/deployments/bigbang-ci
- new sops command is
sops -d bigbang/overlays/bb_release/environment-bb-secret.enc.yaml | sed "s/\\n/\'$\'\n/g" | grep "Fortify admin”
- Password is fixed now (the creds were admin:admin)
Gitlab:
- Need to fix url
- Sonarqube Samples link is broken working link: https://github.com/SonarSource/sonar-scanning-examples/tree/master/sonar-scanner/src
- docker command links need to be fixed
- sonarqube url need to be fixed
Nexus:
- need to fix url
- error on default secret encryption key, need to verify the value file Default Secret Encryption Key Nexus was not configured with an encryption key and is using the Default key.
- fix the docker commands urls
Minio:
- fix the command to pull the secret
kubectl -n minio get secrets minio-creds-secret -o=jsonpath={.data.'config\.env'} | base64 -d
Mattermost:
- current admins are me, Julian hair and Daniel Stocum and bigbang, need to remove the access to the others and just keep bigbang ?
ArgoCD:
- need to fix url
- fail kyverno svc-fail
admission webhook "validate.kyverno.svc-fail" denied the request: resource Deployment/podinfo/podinfo was blocked due to the following policies restrict-image-registries: autogen-validate-registries: 'validation failure: validation error: Image registry is not in the approved list. rule autogen-validate-registries failed at path /image/'
- Pod info tries to pull the following image ghcr.io/stefanprodan/podinfo:6.7.1
Velero:
- need to fix url
- needed to docker login into the velero-tests.release.bigbang.mil
- for the test the image need to be amd64 docker pull --platform linux/amd64 alpine:latest
- tag the image
docker tag alpine:latest velero-tests.release.bigbang.mil/alpine:latest
- push the image velero-tests.release.bigbang.mil/alpine:latest
- this file: kubectl apply -f ./docs/release/velero_test.yaml does not exist in the new repository (only in the old one)
- fail with following error: kubectl apply -f ./docs/release/velero_test.yaml
namespace/velero-test created Warning: kyverno.io/v2beta1 PolicyException is deprecated; use kyverno.io/v2 PolicyException Warning: PolicyException resources would not be processed until it is enabled. policyexception.kyverno.io/velero-test-exception created persistentvolumeclaim/velero-test created Error from server: error when creating "./docs/release/velero_test.yaml": admission webhook "validate.kyverno.svc-fail" denied the request: resource Deployment/velero-test/velero-test was blocked due to the following policies restrict-image-registries: autogen-validate-registries: 'validation failure: validation error: Image registry is not in the approved list. rule autogen-validate-registries failed at path /image/'
- NEED TO FIX THE IMAGE URL IN THE TEST! (Replace dogfood with release)
- the storage class ebs does not exist, use gp2 or gp3 instead
- the command
kubectl exec $veleropod -n velero-test -- cat /mnt/velero-test/test.log
need the following beforeveleropod=$(kubectl get pod -n velero-test -o json | jq -r '.items[].metadata.name')
otherwise it will point to the old pod
Loki:
- need to fix the url
- move the
click on the dropdown
part under thelogin to grafana
part
Tempo:
- need to fix the url
Keycloak:
- need to fix the url
Neuvector:
- need to fix the url
- did not reset the pass, maybe we should add it to sops ?
Grafana & ArgoCD & Anchore & Mattermost SSO:
- need to fix the urls
Monitoring & Cluster Auditor & Tracing (Jaeger) & Alertmanager:
- need to fix the urls
Anchore:
- need to fix the url for the scan image (release registry instead of dogfood)
Edited by Danilo Patrucco - Developer
https://repo1.dso.mil/big-bang/team/deployments/bigbang-ci/-/merge_requests/223
to see if this fixes the podinfo issue.
here the policies from both cluster
clusterpolicy_restrict_image_registries_release.yaml
clusterpolicy_restrict_image_registries_dogfood.yaml
the issue is either spacing or release, will test a re-deploy and if it does not work I will push for the MR approval
- Developer
Suggestions:
- velero test should be encoded in R2D2 or in gogru
Open action items:
- dsop/dccscr#3739 (closed) new repo to move PodInfo into IB
- Work with Storage and Collab team to troubleshoot current MM issue with Istio/Kiali
- Moved release files and docs to bigbang-ci repo https://repo1.dso.mil/big-bang/team/deployments/bigbang-ci/-/merge_requests/225 (initial review)
- Developer
- mm was troubleshooted and the problem was found, it was a webhook that did not exist. https://repo1.dso.mil/big-bang/team/deployments/bigbang-ci/-/merge_requests/228
- Developer
Open Action items:
- waiting for the IB folks to finish adding PodInfo
- Developer
https://ironbank.dso.mil/repomap?searchText=podinfo&page=1&sort=0&order=1&cardsPerPage=10
podinfo is in ironbank, will test it asap
- Danilo Patrucco marked the checklist item replicate the entire manual release checklist from the link below as completed
marked the checklist item replicate the entire manual release checklist from the link below as completed
- Developer
@jfoster @dpritchettrm if it's ok i will go ahead and close this issue after bumping it up to 6 points
- Danilo Patrucco changed weight to 6 from 5
changed weight to 6 from 5
- Danilo Patrucco closed
closed
- Danilo Patrucco changed iteration to Big Bang Iterations Feb 18, 2025 - Mar 3, 2025
changed iteration to Big Bang Iterations Feb 18, 2025 - Mar 3, 2025
- **** removed statusdoing label
removed statusdoing label