Release 1.50.0
-
Copy contents of https://repo1.dso.mil/platform-one/big-bang/customers/bigbang/-/blob/master/docs/release/README.md into this description and check the boxes as you test. -
Also note the comments in this issue. Generally these include release notes to add in.
Release Process
Forward
- Commands prepended in
$
should not be blindly copied and pasted, they should be read, evalulated, and if applicable, executed. - This upgrade process is applicable only to minor version upgrades on Big Bang major version
1
, and subject to change in future major versions. Patch version upgrades follow a different process. - The R2D2 release automation tool can be used to automate certain parts of this release process. Follow the README for cloning and installation. Usage is denoted by the R2-D2 prefix.
- The release branch format is as follows:
release-1.<minor>.x
where<minor>
is the minor release number. Example:release-1.32.x
. The<minor>
string is used as a placeholder for the minor release number in this document. - The release tag format is as follows:
1.<minor>.0
.
NOTE: As you work through the release make a list of pain points / unclear steps. Once the release is complete, provide this feedback to the maintainers via MM and/or an MR to update the documentation. This allows us to continuously improve this process for future release engineers.
1. Release Prep
-
Check last release SHAs - Verify that the previous release branch commit hash matches the last release tag. Investigate with previous RE if they do not match.
- Example: Last release is
1.31.1
, validate the commit sha for the1.31.1
tag matches the commit sha for therelease-1.31.x
branch. -
R2-D2: run w/ the
Check last release SHAs
option selected
-
Create release branch. You will need to request permissions from the maintainers if you run into permissions issues. 1 -
R2-D2: run w/ the
Create release branch
option selectedor
-
In either the Gitlab UI or the Git CLI, create a new branch from
master
with the namerelease-1.<minor>.x
.❕ Important The release branch name must end withx
# CLI example, replace `<release-branch>` with the name of the release branch as specifed above $ cd bigbang $ git checkout master $ git pull $ git checkout -b <release-branch> $ git branch --show-current $ git push --set-upstream origin <release-branch>
-
-
Build release notes: Before running the below steps verify if any packages have been added or graduated from BETA and adjust your local R2D2 config as needed. If there are changes needed open an MR for R2D2 with the change to the default config as well. -
R2-D2: run w/ the
Build release notes
option selected# clone the dogfood cluster repo $ git clone https://repo1.dso.mil/platform-one/big-bang/customers/bigbang/ dogfood-bigbang # cd into the dogfood-bigbang repo's release notes dir $ cd dogfood-bigbang/docs/release # install R2-D2 $ python3 -m pip install git+https://repo1.dso.mil/platform-one/big-bang/apps/product-tools/R2-D2.git # Select the `Build release notes` option $ r2d2 # commit the release notes $ cd ../../ $ git add docs/release/ $ git commit -m "add initial release notes"
-
-
Upgrade Big Bang's version references 💡 Tip Make the following changes in a single commit so it can be cherry picked into master later.-
Bump self-reference version in base/gitrepository.yaml
-
Update chart release version chart/Chart.yaml
-
Update docs/Packages.md
: Add any new Packages. Review if any columns need updates (mTLS STRICT is a common change, follow the other examples where STRICT is noted). Also make sure to update and remove theBETA
badge from any packages that have moved out of BETA. -
Add a new changelog entry for the release, ex: ## [1.<minor>.0] - [!1.<minor>.0](https://repo1.dso.mil/platform-one/big-bang/bigbang/-/merge_requests?scope=all&utf8=%E2%9C%93&state=merged&milestone_title=1.<minor>.0); List of merge requests in this release. <!-- Note: milestone_title=1.<minor>.0 version must match the given minor release version -->
-
Update /docs/understanding-bigbang/configuration/base-config.md
usinghelm-docs
.# example release 1.<minor>.x $ cd bigbang $ git checkout release-1.<minor>.x $ docker run -v "$(pwd):/helm-docs" -u $(id -u) jnorwood/helm-docs:v1.5.0 -s file -t .gitlab/base_config.md.gotmpl --dry-run > ./docs/understanding-bigbang/configuration/base-config.md
-
2. Upgrade and Debug Cluster
⚠ ️ WARNING: Upgrade only, do not delete and redeploy.
-
Connect to the dogfood cluster 📌 NOTE: If you have issues with the AWS CLI commands, adding via the AWS web console is another option. Reach out to core maintainers for assistance. 1 -
Review Elasticsearch health and trial license status: -
Check the age of the pods, in general if younger than 30 days you should be good. Also login to Kibana via SSO - SSO is paywalled so it will fail if the license is expired. If the license is expired, follow the below steps to renew it.
📌 NOTE: only run if trial is expired. after running this you will need to re-configure role mappingRenew ECK Trial
kubectl delete hr ek eck-operator -n bigbang kubectl delete ns eck-operator logging flux reconcile kustomization environment -n bigbang flux suspend hr bigbang -n bigbang flux resume hr bigbang -n bigbang flux suspend hr loki -n bigbang flux resume hr loki -n bigbang flux suspend hr promtail -n bigbang flux resume hr promtail -n bigbang flux suspend hr fluent-bit -n bigbang flux resume hr fluent-bit -n bigbang flux suspend hr mattermost -n bigbang flux resume hr mattermost -n bigbang kubectl delete pods -n mattermost --all
-
-
Review Mattermost Enterprise trial license status: -
Login as the "robot admin" (find creds in encrypted values or with
sops -d bigbang/prod/environment-bb-secret.enc.yaml | grep "Robot admin"
), navigate to the System Console -> Edition and License tab. If the license is expired, follow the below steps to renew it.📌 NOTE: only run if trial is expired.Renew Mattermost Enterprise Trial
To "renew" Mattermost Enterprise trial license, connect to RDS postgres DB using
psql
. Follow the guide (guide will need to be sops decrypted) to connect to the DB. Reach out to core maintainers if you need additional assistance. 1Then run the below commands from within the psql connection, which will cycle the license:
\c mattermost select * from "public"."licenses"; delete from "public"."licenses"; \q kubectl delete mattermost mattermost -n mattermost flux suspend hr -n bigbang mattermost flux resume hr -n bigbang mattermost
Validate that the new Mattermost pod rolls out successfully. If it hasn't reconciled you may need to suspend/resume bigbang again. Login as a system admin, navigate to the System Console -> Edition and License tab. Click the "Start trial" button
-
-
If Flux has been updated in the latest release: -
Run the Flux install script as in the example below
$ cd bigbang $ git checkout release-1.<minor>.x $ git pull $ ./scripts/install_flux.sh -s # the `-s` option will reuse the existing secret so you don't have to provide credentials $ cd ../dogfood-bigbang # go back to dogfood after flux upgrade
-
-
Upgrade the release branch on dogfood cluster master
by completing the following.-
WIP: R2-D2: run w/ the Upgrade dogfood cluster
option selectedor
-
Upgrade base kustomization ref=release-1.<minor>.x
inbigbang/base/kustomization.yaml
to the release branch. -
Upgrade prod kustomization branch: "release-1.<minor>.x"
inbigbang/prod/kustomization.yaml
to the release branch. -
Verify the above changes are correct, then: $ git add bigbang/base bigbang/prod $ git commit -m "upgrade kustomizations to release-1.<minor>.x" $ git push
-
-
Verify cluster has updated to the new release -
Packages have fetched the new revision and match the new release -
Gitrepos have updated to their new versions
-
-
Packages have reconciled -
Watch the Release HRs, Gitrepos, and Kustomizations to check when all HRs have properly reconciled
# check release watch kubectl get gitrepositories,kustomizations,hr,po -A
-
If flux has not updated after ten minutes:
flux reconcile hr -n bigbang bigbang --with-source
-
If flux is still not updating, delete the flux source controller:
kubectl get all -n flux-system kubectl delete pod/source-controller-xxxxxxxx-xxxxx -n flux-system
-
If the helm release shows max retries exhausted, check a describe of the HR. If it shows "another operation (install/upgrade/rollback) is in progress", this is an issue caused by too many Flux reconciles happening at once and typically the Helm controller crashing. You will need to delete helm release secrets and reconcile in flux as follows. Note that ${HR_NAME} is the same HR you described which is in a bad state (typically a specific package and NOT bigbang itself).
# example w/ kiali $ HR_NAME=kiali $ kubectl get secrets -n bigbang | grep ${HR_NAME}
# example output: # some hr names are duplicated w/ a dash in the middle, some are not sh.helm.release.v1.kiali-kiali.v1 helm.sh/release.v1 1 18h sh.helm.release.v1.kiali-kiali.v2 helm.sh/release.v1 1 17h sh.helm.release.v1.kiali-kiali.v3 helm.sh/release.v1 1 17m
# Delete the latest one: $ kubectl delete secret -n bigbang sh.helm.release.v1.${HR_NAME}-${HR_NAME}.v3 # suspend/resume the hr $ flux suspend hr -n bigbang ${HR_NAME} $ flux resume hr -n bigbang ${HR_NAME}
-
If you see errors about no space left when you kubectl describe a failed deployment. The logs have probably filled the filesystem of the node. Determine which node the deployment was scheduled on and ssh to it and delete the logs. You need to have sshuttle running in order to reach the IP of the nodes.
ssh -i ~/.ssh/dogfood.pem ec2-user@xx.x.x.xx sudo -i rm -rf /var/log/containers/* rm -rf /var/log/pods/*
Then the deployment should recover on its own
-
-
All Pods in "Running" or "Completed" state
-
3. UI Tests
❕ Important When verifying each application UI is loading, also verify the website certificates are valid.
Logging
Monitoring
-
Login to grafana with SSO -
Contains Kubernetes Dashboards and metrics -
contains Istio dashboards -
Login to prometheus -
Go to Status > Targets. Validate that no unexpected services are marked as Unhealthy KNOWN UNHEALTHY INCLUDE: serviceMonitor/monitoring/monitoring-monitoring-kube-vault/0 (known issue) serviceMonitor/monitoring/monitoring-monitoring-kube-kube-controller-manager/0 serviceMonitor/monitoring/monitoring-monitoring-kube-kube-etcd/0 serviceMonitor/monitoring/monitoring-monitoring-kube-kube-scheduler/0
Loki
-
Login to grafana as admin -
General \ Loki Dashboard quick search
dashboard shows data/logs from past few minutes and hours. -
General \ Loki \ Operational
dashboard has 100% Success rates for Ingester/Distributer & At the bottom underBoltDB Shipper
200-Write
status method has data and is greater than 0. -
On the left click Gear > Data Sources > Loki > scroll down > Click Save & Test
; Result should be "Data source connected and labels found."
Tempo
-
Login to grafana as admin -
On the left click Gear > Data Sources > Tempo > scroll down > Click Save & Test
; Result should be "Data source is working." -
In upper left corner click Explore. Pick Tempo in top left drop down menu. Choose Query type = Loki Search
> enter '{app="tempo"}' in Log Browser query text box. Ensure traces from last several minutes and hours are populating. Click on Trace ID and ensure that information panel loads data. -
Visit tempo tracing & ensure Services are populating under Service
drop down.
Cluster Auditor
-
Login to grafana with SSO -
OPA Violations dashboard is present and shows violations in namespaces
Kiali
-
Login to kiali with SSO -
Validate graphs and traces are visible under applications/workloads -
Validate no errors appear ℹ ️ Note Red notification bell would be visible if there are errors. Errors on individual application listings for labels, etc are expected and OK.
GitLab
-
Login to gitlab with SSO -
Edit profile and change user avatar -
Create new public group with release name. ie. release-1-<minor>-x
-
Create new public project (under the group you just made), also with release name. ie. release-1-<minor>-x
-
git clone project -
Pick one of the project folders from Sonarqube Samples and copy all the files into your clone from dogfood, then push up. To git push you will need to create an access token in your profile. Then use that access token as the password. -
docker push and docker pull image to/from registry. Use the access token that you created for the docker login. docker pull alpine docker tag alpine registry.dogfood.bigbang.dev/<GROUPNAMEHERE>/<PROJECTNAMEHERE>/alpine:latest docker login registry.dogfood.bigbang.dev docker push registry.dogfood.bigbang.dev/<GROUPNAMEHERE>/<PROJECTNAMEHERE>/alpine:latest docker image rm registry.dogfood.bigbang.dev/<GROUPNAMEHERE>/<PROJECTNAMEHERE>/alpine:latest docker pull registry.dogfood.bigbang.dev/<GROUPNAMEHERE>/<PROJECTNAMEHERE>/alpine:latest
Sonarqube
-
Login to sonarqube with SSO -
Add a project for your release -
Generate a token for the project and copy the token somewhere safe for use later -
Click other, linux, and copy the projectKey from -Dsonar.projectKey=XXXXXXX
for use later -
After completing the gitlab runner test return to sonar and check that your project now has analysis ℹ ️ Note The project token and project key are different values.
Gitlab Runner
-
Log back into gitlab and navigate to your project -
Under settings, CI/CD, variables add two vars: -
SONAR_HOST_URL
set equal tohttps://sonarqube.dogfood.bigbang.dev/
-
SONAR_TOKEN
set equal to the token you copied from Sonarqube earlier (make this masked)
-
-
Under settings, CI/CD, Auto DevOps disable the auto devops pipeline and save. -
Add a .gitlab-ci.yml
file to the root of the project, paste in the contents of sample_ci.yaml, replacing-Dsonar.projectKey=XXXXXXX
with what you copied earlier -
Commit, validate the pipeline runs and succeeds (may need to retry if there is a connection error), then return to the last step of the sonar test
Nexus
-
Login to Nexus as admin, password is in the nexus-repository-manager-secret
secret:# username is admin, password is the output of this command kubectl get secret -n nexus-repository-manager nexus-repository-manager-secret -o go-template='{{index .data "admin.password" | base64decode}}'
-
Validate there are no errors displaying in the UI -
Push/pull an image to/from the nexus registry -
With the credentials from the encrypted values (or the admin user credentials) login to the nexus registry $ docker login containers.dogfood.bigbang.dev
-
Tag and push an image to the registry: # ex: <release> = `1-32-0` $ docker tag alpine:latest containers.dogfood.bigbang.dev/alpine:<release> $ docker push containers.dogfood.bigbang.dev/alpine:<release>
-
Pull down the image for the previous release # ex: <last-release> = `1-31-0` $ docker pull containers.dogfood.bigbang.dev/alpine:<last-release>
-
Anchore
-
Login to Anchore with SSO -
Log out and log back in as the admin user - password is in anchore-anchore-engine-admin-pass
secret (admin will have pull credentials set up for the registries):kubectl get secret anchore-anchore-engine-admin-pass -n anchore -o json | jq -r '.data.ANCHORE_ADMIN_PASSWORD' | base64 -d ; echo ' <- password'
-
Scan image in dogfood registry, registry.dogfood.bigbang.dev/GROUPNAMEHERE/PROJECTNAMEHERE/alpine:latest
-
Scan image in nexus registry, containers.dogfood.bigbang.dev/alpine:<release-number>
(use your release number, ex:1-32-0
) -
Validate scans complete and Anchore displays data (click the SHA value for each image)
Argocd
-
Login to argocd with SSO -
Logout and login with username admin
. The password is in theargocd-initial-admin-secret
secret. If that doesn't work attempt a password reset:kubectl -n argocd get secret argocd-initial-admin-secret -o json | jq '.data|to_entries|map({key, value:.value|@base64d})|from_entries'
-
Create application -
Click [Create Application]
, fill in the belowSetting Value Application Name podinfo Project default Sync Policy Automatic Sync Policy check both boxes Sync Options check "auto-create namespace" Repository URL https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/podinfo.git Revision HEAD Path chart Cluster URL https://kubernetes.default.svc Namespace podinfo -
Click [Create]
(top of page) -
Validate app syncs/becomes healthy WIP: Creating application with YAML template
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: podinfo spec: destination: name: '' namespace: podinfo server: 'https://kubernetes.default.svc' source: path: chart repoURL: 'https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/podinfo.git' targetRevision: HEAD project: default syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true
-
-
Delete app
Minio
-
Log into the Minio UI - access and secret key are in the minio-root-creds-secret
secretkubectl -n minio get secret minio-creds-secret -o json | jq -r '.data.accesskey' | base64 -d ; echo ' <- access key' kubectl -n minio get secret minio-creds-secret -o json | jq -r '.data.secretkey' | base64 -d ; echo ' <- secret key'
-
Create bucket -
Store file to bucket -
Download file from bucket -
Delete bucket and files
Mattermost
-
Login to mattermost with SSO. -
Update/modify profile picture -
Send chats/validate chats from previous releases are visible. 💡 Tip The ability to see chats in other teams requires Mattermost administrator rights. -
Under System Console -> Elastic click Test Connection and Index Now, then validate success
Twistlock
-
Validate that the Twistlock init job pod ran and completed, this should do all setup (license/user) and the required defender updates automatically (Pod is automatically removed after 30 minutes) -
If pod is gone already, check the prometheus target fopr twistlock
-
-
Login to twistlock/prisma cloud with the credentials in the secret: kubectl get secret -n twistlock twistlock-console -o go-template='{{.data.TWISTLOCK_USERNAME | base64decode}}' ; echo ' <- username' kubectl get secret -n twistlock twistlock-console -o go-template='{{.data.TWISTLOCK_PASSWORD | base64decode}}' ; echo ' <- password'
-
Under Manage -> Defenders -> Manage, make sure # of defenders online is equal to number of nodes on the cluster Defenders will scale with the number of nodes in the cluster. If there is a defender that is offline, check whether the node exists in cluster anymore. Cluster autoscaler will often scale up/down nodes which can result in defenders spinning up and getting torn down. As long as the number of defenders online is equal to the number of nodes everything is working as expected.
Neuvector
-
Login to Neuvector with the default login (currently admin:admin, update this to something super secure) -
Navigate to Assets -> System Components and validate that all components are showing as healthy -
Under Assest -> Containers, click on a given image and run a scan, validate completion and results -
Under Network Activity validate that the graph loads and shows pods + traffic
Kyverno
📌 NOTE: if using MacOS make sure that you have gnu sed installed and add it to your PATH variable GNU SED Instructions
-
Test secret sync in new namespace # create secret in kyverno NS kubectl create secret generic \ -n kyverno kyverno-bbtest-secret \ --from-literal=username='username' \ --from-literal=password='password' # Create Kyverno Policy kubectl apply -f https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/kyverno/-/raw/main/chart/tests/manifests/sync-secrets.yaml # Wait until the policy shows as ready before proceeding kubectl get clusterpolicy sync-secrets # Create a namespace with the correct label (essentially we are dry-running a namespace creation to get the yaml, adding the label, then applying) kubectl create namespace kyverno-bbtest --dry-run=client -o yaml | sed '/^metadata:/a\ \ labels: {"kubernetes.io/metadata.name": "kyverno-bbtest"}' | kubectl apply -f - # Check for the secret that should be synced - if it exists this test is successful kubectl get secrets kyverno-bbtest-secret -n kyverno-bbtest
-
Delete the test resources # If above is successful, delete test resources kubectl delete -f https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/kyverno/-/raw/main/chart/tests/manifests/sync-secrets.yaml kubectl delete secret kyverno-bbtest-secret -n kyverno kubectl delete ns kyverno-bbtest
Velero
-
Backup PVCs using velero_test.yaml $ kubectl apply -f ./docs/release/velero_test.yaml # wait 30s for velero to be ready then: # exec into velero_test container, check log $ veleropod=$(kubectl get pod -n velero-test -o json | jq -r '.items[].metadata.name') $ kubectl exec $veleropod -n velero-test -- tail /mnt/velero-test/test.log
- Install the velero CLI on your workstation if you don't already have it (for MacOS, run
brew install velero
). - Then set VERSION to the release you are testing and use the CLI to create a test backup:
$ VERSION=1-<minor>-0 $ velero backup create velero-test-backup-${VERSION} -l app=velero-test $ velero backup get
- Wait a bit, re-run
velero backup get
, when it shows "Completed" delete the app.
$ kubectl delete -f ./docs/release/velero_test.yaml namespace "velero-test" deleted persistentvolumeclaim "velero-test" deleted deployment.apps "velero-test" deleted
- Install the velero CLI on your workstation if you don't already have it (for MacOS, run
-
Restore the test resources from the backup $ velero restore create velero-test-restore-${VERSION} --from-backup velero-test-backup-${VERSION} # exec into velero_test container $ kubectl exec $veleropod -n velero-test -- cat /mnt/velero-test/test.log # Old log entries and new should be in log if backup was done correctly
-
Cleanup test $ kubectl delete -f ./docs/release/velero_test.yaml
Keycloak
-
Login to Keycloak admin console. The credentials are in the keycloak-credentials
secret:kubectl get secret keycloak-credentials -n keycloak -o json | jq -r '.data.adminuser' | base64 -d ; echo " <- admin user" kubectl get secret keycloak-credentials -n keycloak -o json | jq -r '.data.password' | base64 -d ; echo " <- password"
Tracing (Jaeger)
-
Load tracing, login with SSO, and ensure there are no errors on main page and that traces can be found for apps
Alertmanager
-
Load alertmanager, login with SSO, and validate that the watchdog alert at minimum is firing
4. Create Release
-
Re-run helm docs in case any package tags changed as a result of issues found in testing. $ cd bigbang $ git pull # pull any last minute cherry picks, verify nothing has greatly changed $ git checkout release-1.<minor>.x $ docker run -v "$(pwd):/helm-docs" -u $(id -u) jnorwood/helm-docs:v1.5.0 -s file -t .gitlab/base_config.md.gotmpl --dry-run > ./docs/understanding-bigbang/configuration/base-config.md # commit and push the changes (if any)
-
Create release candidate tag based on release branch, ie. 1.<minor>.0-rc.0
. Tagging will additionally create release artifacts and the pipeline runs north of 1 hour. You will need to request tag permissions from the maintainers 1.-
To do this via the UI (generally preferred): tags page -> new tag, name:
1.<minor>.0-rc.0
, create from:release-1.<minor>.x
, message: "release candidate", release notes: leave empty -
To do this via git CLI:
$ git tag -a 1.<minor>.0-rc.0 -m "release candidate" # list the tags to make sure you made the correct one $ git tag -n # push $ git push --tags
-
-
Passed tag pipeline. Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. 1
-
Create release tag based on release branch. ie. 1.<minor>.0
.-
To do this via the UI (generally preferred): tags page -> new tag, name:
1.<minor>.0
, create from:release-1.<minor>.x
, message:release 1.<minor>.0
, release notes: leave empty -
To do this via git CLI:
$ git tag -a 1.<minor>.0 -m "release 1.<minor>.0" # list the tags to make sure you made the correct one $ git tag -n # push $ git push --tags
-
-
Passed release pipeline. Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. 1
-
Add release notes to release. Collect dogfood/docs/release/release-notes-1--0.md that was created earlier. Paste this into release.
-
Modify release notes: - Create Upgrade Notices, based off of the listed notices in the release issue. Also review with maintainers to see if any other major changes should be noted.
- Move MRs from 'Big Bang MRs' into their specific sections below. If MR is for Big Bang (docs, CI, etc) and not a specific package, they can remain in place under 'Big Bang MRs'. General rule of thumb: if the git tag changes for a package - put the MR under that package; and if there is no git tag change for a package - put it under BB.
- Adjust known issues as needed: If an issue has been resolved it can be removed and if any new issues were discovered they should be added.
-
Cherry-pick release commit(s) as needed with merge request back to master branch. We do not ever merge release branch back to master. Instead, make a separate branch from master and cherry-pick the release commit(s) into it. Use the resulting MR to close the release issue. # Navigate to your local clone of the BB repo cd path/to/my/clone/of/bigbang # Get the most up to date master git checkout master git pull # Check out new branch for cherrypicking git checkout -b 1.<minor>-cherrypick # Repeat the following for whichever commits are required from release # Typically this is the initial release commit (bumped GitRepo, Chart.yaml, CHANGEGLOG, README) and a final commit that re-ran helm-docs (if needed) git cherry-pick <commit sha for Nth commit> git push --set-upstream origin 1.<minor>-cherrypick # Create an MR using Gitlab, merging this branch back to master, reach out to maintainers to review
-
Close Big Bang Milestone in GitLab. -
Handoff the release to the maintainers 1, they will review then celebrate and announce the release in the public MM channel