Release 2.47.0
Release Engineering
Expectations for Release Engineer
- A regularly scheduled release takes place Tuesday-Friday every other week at the end of a sprint.
- Do not sign up for release engineer if you have any planned time off during Tues-Fri during the release week.
- Testing on the Dogfood cluster should be complete by end of day Wednesday.
- The release should be mostly complete by end of day Thursday. (No one wants to be troubleshooting release things on a Friday at 5 pm)
- In order to expedite the release process, once a package upgrade is verified to have broken either the upgrade or the package UI functionality, the SOP will be to immediately back out the change for that package. usine the Package Rejection Process
- join the Il4 Mattermost Big Bang Release Engineering Channel. This group chat will allow the Maintainers [^maintainers] to support you. This group also provides the Maintainers 1 with visibility on where we are in the release process.
- The RE should seek anchor’s approval prior to starting step 1. Maintainers tend to push through last minute MRs on Tuesday morning. We want to get any finished work into the release wherever possible.
- If a shadow is assigned, hop in a Zoom breakout room and pair to complete the release process.
Shadowing
- Team members should shadow a release engineer prior to becoming RE. Assigned REs should have a solid understanding of the release process prior to their turn and have shadowed enough to understand and be able to execute the release process.
- Try to clarify any questions about the release process steps prior to becoming the actual release engineer.
- Review the release process steps before it's your turn. As always you can reach out to the Maintainers 1 with questions about the release process.
- Sync regularly with the release engineer to understand any issues/blockers that they run into. Likely you'll run into similar issues or blockers.
Prerequisites
Important: Make sure that you have met all prerequisites before the day you're scheduled to start working on a release in case you run into any problems.
-
Install or update all the tools listed in this Big Bang Training document
-
Install or update R2-D2 according to the instructions in the R2-D2 README
-
In the
.rd2d/config.yaml
file ensure that the full path is specified for the dogfood and the bigbang repository location, e.g. instead of~/bigbang
use/Users/Myuser/bigbang
-
In the
.rd2d/config.yaml
file ensure that the beta_list hasN/A
specified as a value,beta_list: N/A
-
In the
-
Install or update
gogru
according to the instructions in the gogru README -
Request access to and/or ensure you can connect to the Dogfood cluster according to the instructions in the Dogfood README
-
Have your P1 username, password and MFA device
-
If you have a Mac make sure to install and setup gsed
brew install gnu-sed echo $PATH | grep -q '/usr/local/bin'; [ $? -ne 0 ] && export PATH=/usr/local/bin:$PATH
Release Process
Forward
- DO NOT blindly copy and paste commands prepended with
$
. Carefully read and evaluate these commands and execute them if applicable. - This upgrade process is applicable only to minor Big Bang version upgrades of Big Bang major version
2
. It is subject to change in future major versions. Patch version upgrades follow a different process. - Use the Gogru release automation tool along with
R2D2
to complete the release. CurrentlyGogru
only supports the1. Release Prep
and2. Upgrade and Debug Cluster
parts of the release process. Follow the README for instructions on how to clone and installGogru
. - Use the R2D2 release automation tool to automate certain parts of this release process. Follow the README for instructions on how to clone and install R2D2. Usage of R2D2 is denoted in the document by the _R2-D2_ prefix.
- The release branch format is as follows:
release-<major>.<minor>.x
where<minor>
is the minor release number. Example:release-2.7.x
. The<minor>
string is a placeholder for the minor release number in this document. - The release tag format for major version 2 of Big Bang is:
2.<minor>.0
.
NOTE: As you work through the release make a list of pain points / unclear steps. Once the release is complete, provide this feedback to the maintainers via MM and/or an MR to update the documentation. This allows us to continuously improve this process for future release engineers.
1. Release Prep
NOTE: _If you need the manual steps and/or R2D2 steps, They can be found _Here
-
Create the Bigbang (product) Issue
./gogru createIssue
-
Verify the Dogfood cluster is up and running normally
./gogru dogfoodCheck
- If there are any issues with the Helm Releases or Pods in the output, investigate and fix them before continuing
-
Create the release branches for Dogfood and BigBang (product), and verify the hash of the release branch matches the master branch
./gogru createRB
-
Update BigBang (product) version references with the new release version
./gogru update-version-references
2. Upgrade and Debug Cluster
WARNING: Upgrade only, do not delete and redeploy.
NOTE: _If you need the manual steps, They can be found _Here
-
Check and Renew ElasticSearch (ECK) License
./gogru eck-license-check
-
Check and Renew Mattermost License
./gogru mm-license-check
Before running the command select the AWS_PROFILE or configure AWS access usingaws configure
-
Upgrade Flux
./gogru update-flux
-
Upgrade Dogfood
./gogru upgrade-dogfood
-
After running the
upgrade-dogfood
command, an MR will be opened in your name. Have the Maintainers 1 merge your branch intomaster
. Once merged,flux
should pick up the changes and begin reconcile the helm releases. This may take up to an hour depending on the size of the release. -
As an extra check, you can watch the Release helmreleases, gitrepositories, pods, and kustomizations to check when all HRs have properly reconciled. Gogru's dogfoodCheck only looks at the HR's and Pods for now.
watch kubectl get gitrepositories,kustomizations,hr,po -A
IF FLUX HAS NOT UPDATED AFTER 10 MINUTESShell Instructions to force Flux to reconcile
```shell flux reconcile hr -n bigbang bigbang --with-source --force ``` - If flux is still not updating, delete the flux source controller:
kubectl delete pod -n flux-system -l app=source-controller
- If the helm release shows max retries exhausted, check a describe of the HR. If it shows "another operation (install/upgrade/rollback) is in progress", this is an issue caused by too many Flux reconciles happening at once and typically the Helm controller crashing. You will need to delete helm release secrets and reconcile in flux as follows. Note that ${HR_NAME} is the same HR you described which is in a bad state (typically a specific package and NOT bigbang itself).
# example w/ kiali $ HR_NAME=kiali $ kubectl get secrets -n bigbang | grep ${HR_NAME}
# example output: # some hr names are duplicated w/ a dash in the middle, some are not sh.helm.release.v1.kiali-kiali.v1 helm.sh/release.v1 1 18h sh.helm.release.v1.kiali-kiali.v2 helm.sh/release.v1 1 17h sh.helm.release.v1.kiali-kiali.v3 helm.sh/release.v1 1 17m
# Delete the latest one: $ kubectl delete secret -n bigbang sh.helm.release.v1.${HR_NAME}-${HR_NAME}.v3 # suspend/resume the hr $ flux suspend hr -n bigbang ${HR_NAME} $ flux resume hr -n bigbang ${HR_NAME}
- If you see errors about no space left when you kubectl describe a failed deployment. The logs have probably filled the filesystem of the node. Determine which node the deployment was scheduled on and ssh to it and delete the logs. You need to have sshuttle running in order to reach the IP of the nodes.
ssh -i ~/.ssh/dogfood.pem ec2-user@xx.x.x.xx sudo -i rm -rf /var/log/containers/* rm -rf /var/log/pods/*
Then the deployment should recover on its own
-
-
Verify cluster has updated to the new release and verify that there are no outstanding issues with Pods or Helm Releases
./gogru dogfoodCheck
NOTE: IfPrometheus
does not start and is logging an error relating to Vault permissions, follow these instructions to reissue the Vault token. NOTE: If for some reason the upgrade fails catastrophically and the release needs to be rolled back, follow the rollback instructions. This should be used only as a last resort to re-test the upgrade process. HANDLING PACKAGE UPGRADE FAILURESFollow the Package Rejection Process to rollback the package upgrade
3. UI Tests
WARNING: if you encounter a package that is failing please follow the Package Rejection Process to rollback the package upgrade.
NOTE: _If you need the manual steps, They can be found _Here
Default Application Credentials
Logging
-
Login to kibana with SSO.
NOTE: If you get "You do not have permission to access the requested page," follow the instructions under "Renew ECK Trial" in the 2. Upgrade and Debug Cluster section above. Don't forget to log in as admin and reconfigure role mapping after you run the kubectl and flux commands. -
Verify that Kibana is actively indexing/logging.
- To do this, click on the drop-down menu on the upper-left corner, then Under "Analytics" click Discover. Click "Create data view." In the "Index pattern" field, enter
jaeger*
. Set the "Timestamp field" toI don't want to use the time filter
, then click "Use without saving". Log data should populate. - If it is not indexing (you see no index for jaeger), you may have renewed the ElasticSearch license and jaeger did not reconcile after the renewal. Force jaeger to pick up the new configuration by doing:
flux reconcile hr -n bigbang jaeger --force --with-source
- To do this, click on the drop-down menu on the upper-left corner, then Under "Analytics" click Discover. Click "Create data view." In the "Index pattern" field, enter
Kiali
-
Login to kiali with SSO
-
Validate graphs and traces are visible under applications/workloads
- Graphs at the overview
- Graphs for inbound metrics
- Graphs for outbound metrics
- Graphs for Traces
-
Validate no errors appear
NOTE: Red notification bell would be visible if there are errors. Errors on individual application listings for labels, etc are expected and OK.
Twistlock
-
Log in here
- In the Targets drop down choose
serviceMonitor/twistlock/twistlock-console/0
- Click the small "show more" button that appears. You should see a State of UP and no errors.
- In the Targets drop down choose
-
R2-D2:
Twistlock Test
Fortify
-
R2-D2:
Fortify Test
GitLab Sonarqube Gitlab Runner
NOTE: if gitlab has updated to v18 and pipelines are not running, need to follow this procedure to register the runner Runner Registration, right now this process works but v18 gets rid of that bypass capability and we will have to follow a new process like here.
- Login to gitlab with SSO
- Edit profile and change user avatar. If your avatar doesn't change right away, check again later. It can take several minutes or more.
- Go to Security > Access Token and create a personal access token. Select all scopes. Record the token and the token name. You'll need it for the next step.
-
R2-D2:
Gitlab Test
. When prompted for the username use the token name and not the name of the user that created the token.\
NOTE: If the pipeline fail, verify if Sonarqube is getting a major version update by going to 'http://sonarqube.dogfood.bigbang.mil/setup' and proceed to upgrade it. After the Sonarqube upgrade is done the pipeline shoud work again.
GitLab registry image upload Manual Steps
-
Login in the gitlab registry with the token created in the GitLab Sonarqube Gitlab Runner by usign the following command:
docker login registry.dogfood.bigbang.mil
. it is also possible to login using the root credentials, by using theusername: root
and for password use the following command to source the root password:kubectl get secret -n gitlab gitlab-gitlab-initial-root-password -o json | jq -r ".data.password" | base64 -d
- Test a docker image push/pull to/from registry. and are release-x-xx-x
export GROUPNAMEHERE=release-X-XX-X; export PROJECTNAMEHERE=release-X-XX-X # where X-XX-X is your release, e.g. 2-45-0
docker pull alpine
docker tag alpine registry.dogfood.bigbang.mil/$GROUPNAMEHERE/$PROJECTNAMEHERE/alpine:latest
docker login registry.dogfood.bigbang.mil # Enter your gitlab token name and personal access token
docker push registry.dogfood.bigbang.mil/$GROUPNAMEHERE/$PROJECTNAMEHERE/alpine:latest
docker image rm registry.dogfood.bigbang.mil/$GROUPNAMEHERE/$PROJECTNAMEHERE/alpine:latest
docker pull registry.dogfood.bigbang.mil/$GROUPNAMEHERE/$PROJECTNAMEHERE/alpine:latest
Sonarqube
NOTE: If the Sonarqube is getting a major version update, you will need to go to 'http://sonarqube.dogfood.bigbang.mil/setup' to upgrade before you will be able to login.
- Login to sonarqube with SSO
-
Verify your Project says
PASSED
Nexus
-
Login to Nexus as admin, password is in the nexus-repository-manager-secret secret:
# username is admin, password is the output of this command kubectl get secret -n nexus-repository-manager nexus-repository-manager-secret -o go-template='{{index .data "admin.password" | base64decode}}' ; echo
-
With the credentials from the encrypted values (or the admin user credentials) login to the nexus registry.
docker login containers.dogfood.bigbang.mil
- Tag and push an image to the registry:
export NEXUS_RELEASE=X-XX-X # where X-XX-X is the release e.g. 2-45-0 docker pull alpine:latest docker tag alpine:latest containers.dogfood.bigbang.mil/alpine:$NEXUS_RELEASE docker push containers.dogfood.bigbang.mil/alpine:$NEXUS_RELEASE
-
R2-D2:
Nexus Test
. When asked for the release number, use a full numberic value, no alphabetical values, e.g.2.45.0
and not2.45.X
NOTE: If the previous release container does not exist, the test will fail.
Example: the container tagged with 2.45.1 is not present and the test is executed, the test will fail, to fix this issue proceed to push a container with the tag 2.45.1, after that the test will succeed.
Minio
-
R2-D2:
Minio Test
Mattermost
-
R2-D2:
Mattermost Test
ArgoCD
-
R2-D2:
ArgoCD Test
NOTE: This test fails occasionally. You may need to run it multiple times to get it to pass.
Kyverno
-
R2-D2:
Kyverno Test
.
Velero
-
R2-D2:
Velero Test
NOTE: This test fails occasionally. You may need to run it multiple times to get it to pass.
Loki
-
R2-D2:
Loki Test
NOTE: theLoki / Operational
dashboard is currently missing and we have a task here to add it back inLoki Manual Steps
-
Login to grafana as admin. User name "admin". Retrieve password with
cd into/my/dogfood/folder/
and thensops -d bigbang/prod2/environment-bb-secret.enc.yaml | sed 's/\\n/\'$\'\n/g' | grep grafana -A 2
-
Click on the drop-down in the upper-left corner again and choose Data Sources, then choose Loki. Scroll down and click
Save & Test
. A message should appear that reads "Data source successfully connected." -
Click on the drop-down on upper-left, choose Dashboards, then
Loki Dashboard quick search
. Verify that this dashboard shows data/logs from the past few minutes and hours. -
Click on the drop-down on upper-left, choose Dashboards, then
Loki / Operational
. Verify that both the Distributor Success Rate and Ingester Success rate are 100%, and that when you expand the Chunks item (toward the bottom of the page), data exists. NOTE: this dashboard is currently missing and we have a task here to add it back in
-
Tempo
-
R2-D2:
Tempo Test
NOTE: this test is obsolete at the moment as we do not have any data tethered to tempo at the momentTempo Manual Steps
-
Login to grafana as admin. User name "admin". Retrieve password with
cd into/my/dogfood/folder/
and thensops -d bigbang/prod2/environment-bb-secret.enc.yaml | sed 's/\\n/\'$\'\n/g' | grep grafana -A 2
-
Click on the drop-down menu on the upper-left corner, then choose Data Sources, then Tempo. Scroll down and click
Save & Test
. A message should appear that reads "Data source successfully connected." -
Visit tempo tracing & ensure Services are populating under
Service
drop down. For example, you might see jaeger-query and tempo-grpc-plugin listed as options. NOTE: this test is obsolete at the moment as we do not have any data tethered to tempo at the moment
-
Keycloak
-
R2-D2:
Keycloak Test
Neuvector
-
R2-D2:
Neuvector Test
Grafana & ArgoCD & Anchore & Mattermost SSO
-
R2-D2:
Native SSO Grafana, ArgoCD, Anchore and Mattermost Test
NOTE: Grafana OPA Violations dashboard showing "No Data" is a known issue. This has been skipped since 2.18.0, current open issue awaiting fix is here.
Monitoring & Cluster Auditor & Tracing (Jaeger) & Alertmanager
-
R2-D2:
authservice SSO Prometheus, Tracing, AlertManager, and Cluster Auditor Test
NOTE: Grafana OPA Violations dashboard showing "No Data" is a known issue. This has been skipped since 2.18.0, current open issue awaiting fix is here. NOTE: if vault shows up as UNHEALTHY, that means it was rebuilt and we need to update the permissions on thevault/secrets/token
file within the prometheus pod. (we have an issue with the permissions on the file.)-
to fix this:
- exec into the
vault-agent
container on theprometheus-monitoring-monitoring-kube-prometheus-0
pod and run the following
chmod 644 /vault/secrets/token
- exec into the
-
Anchore
-
log back in as the admin user - password is in
anchore-anchore-enterprise
secret (admin will have pull credentials set up for the registries):kubectl get secret anchore-anchore-enterprise -n anchore -o json | jq -r '.data.ANCHORE_ADMIN_PASSWORD' | base64 -d; echo ' <- password'
NOTE: the GitLab Sonarqube Gitlab Runner image push step and the Nexus image push steps need to be completed before the next step -
Scan image in dogfood registry,
registry.dogfood.bigbang.mil/GROUPNAMEHERE/PROJECTNAMEHERE/alpine:latest
-
Go to Images Page, Analyze Tag
- Registry:
registry.dogfood.bigbang.mil
- Repository:
GROUPNAME/PROJECTNAME/alpine
- Tag:
latest
- Registry:
-
Go to Images Page, Analyze Tag
-
Scan image in nexus registry,
containers.dogfood.bigbang.mil/alpine:<release-number>
(use your release number, ex:1-XX-0
). If authentication fails that means that the Nexus credentials have changed. Retrieve the Nexus credentials using instructions from Nexus above. Update creds in Anchore by clicking on menu Images > Analyze Repository > link at bottom "clicking here" > click on registry name > then edit the credentials. -
Validate scans complete and Anchore displays data (click the SHA value for each image)
4. Create Release
STOP: Before finalizing the release, confirm that the latest bb-docs-compiler pipeline passed. If not, identify, correct, and cherry-pick a fix into the release branch. This will prevent wasting time re-running pipelines if the docs compiler pipeline fails.
Prep
-
Re-run helm docs to ensure that on the latest version + latest package versions.
cd bigbang git pull # pull any last minute cherry picks, verify nothing has greatly changed git checkout release-2.<minor>.x docker run -v "$(pwd):/helm-docs" -u $(id -u) jnorwood/helm-docs:v1.5.0 -s file -t .gitlab/base_config.md.gotmpl --dry-run > ./docs/understanding-bigbang/configuration/base-config.md # commit and push the changes (if any)
Build The Release Notes
-
R2-D2: run with the
Build release notes
option selected -
Clean up release notes
- Check to see if any packages have been added or graduated from BETA. There is no single resource that lists this information, but you should generally be aware of packages that have been added or moved out of BETA from discussions with other team members during your workdays. You can also ask the Maintainers 1 if unsure. If any packages have been added or moved out of BETA, adjust your local R2D2 config as needed. Also open an MR for R2D2 with the change to the default config.
- Create Upgrade Notices, based off of the listed notices in the release issue. Also review with maintainers to see if any other major changes should be noted.
- Move MRs from 'Big Bang MRs' into their specific sections below. If MR is for Big Bang (docs, CI, etc) and not a specific package, they can remain in place under 'Big Bang MRs'. General rule of thumb: if the git tag changes for a package - put the MR under that package; and if there is no git tag change for a package - put it under BB.
- Adjust known issues as needed: If an issue has been resolved it can be removed and if any new issues were discovered they should be added.
- Verify table contains all chart upgrades mentioned in the MR list below.
- Verify table contains all application versions. Compare with previous release notes. If a package was upgraded there will usually be a bump in application version and chart version in the table.
- Verify all internal comments are removed from Release Notes, such as comments directing the release engineer to copy/move things around
- Any issues encountered during release upgrade testing need to be thoroughly documented in the release notes. If you had to delete a replica set, force restart some pods, delete a PV, please make sure these steps are included in our known issues section of the release notes.
-
Scroll through every package listed and verify the following
- it has a link to at least one MR
- the CHANGELOG entries match the MR listed
- if more than one MR or CHANGELOG entry for a package exists then verify it makes sense (ie if 2 MRs that there are CHANGELOG entries for both. Conversely, if more than one CHANGELOG entry make sure those entries are tied to the MRs in this release, sometimes changlogs get messed up causing more entries in the release notes than neccessary)
-
Publish Release Notes to Dogfood
-
Commit the resulting release notes and push to
<release-branch>
in Big Bang (dogfood) - Create an MR for the Maintainers 1 and have it merged before continuing on
-
Commit the resulting release notes and push to
Create Release Candidate Tag
-
Create release candidate tag based on release branch, ie.
2.<minor>.0-rc.0
. Tagging will additionally create release artifacts and the pipeline runs north of 1 hour. You will need to request tag permissions from the Maintainers 1.-
To do this via the UI (generally preferred): tags page -> new tag, name:
2.<minor>.0-rc.0
, create from:release-2.<minor>.x
, message: "release candidate", release notes: leave empty -
To do this via git CLI:
git tag -a 2.<minor>.0-rc.0 -m "release candidate" # list the tags to make sure you made the correct one git tag -n git push origin 2.<minor>.0-rc.0
-
-
Passed pipeline for Release Candidate tag.
-
Passed pipeline in bb-docs-compiler for latest RC tag (Gets created/scheduled towards the end of the bigbang release pipeline).
Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. Maintainers 1
-
Check the releases for release notes.
Create Release Tag
-
Create release tag based on release branch. ie.
2.<minor>.0
.-
To do this via the UI (generally preferred): tags page -> new tag, name:
2.<minor>.0
, create from:release-2.<minor>.x
, message:release 2.<minor>.0
, release notes: leave empty -
To do this via git CLI:
git tag -a 2.<minor>.0 -m "release 2.<minor>.0" # list the tags to make sure you made the correct one git tag -n # push git push origin 2.<minor>.0
-
-
Passed release pipeline.
Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. Maintainers 1
- Ensure that the RC tag release created prior to this was deleted as well.
-
Ensure Release Notes are present here
-
Create a reminder to ensure that release docs have compiled and appear on Big Bang Docs and they have been mirrored to the IL2 Big Bang repository the following day.
-
In Big Bang Docs select Latest version and go in Packages, make sure that the pacakages version is updated to match the version specified in the release (you can select a couple of packages that were updated in the release and match the version in the value file).
- Wait until the Monday after the release and make sure that the tag for release exist under latest, tagged docs are posted only once a week.
- Click around in the docs and make sure docs are properly formatted, pay special attention to lists and tables.
NOTE: BB-docs won't actually publish the latest release version until the nightly bb-docs job runs. You can check this on the following day. NOTE: BB-docs won't actually publish the tag release version until the Weekly Docs Gen (tag) scheduled pipeline runs. You can check this on Saturday or Monday of the week after the release. NOTE: You may require access be granted to the IL2 Big Bang repository to view the status. Contact the Maintainers 1 for assistance. -
In Big Bang Docs select Latest version and go in Packages, make sure that the pacakages version is updated to match the version specified in the release (you can select a couple of packages that were updated in the release and match the version in the value file).
RELEASE IT
-
Type up a release announcement for the Maintainers 1 to post in the Big Bang Value Stream channel. Mention any breaking changes and some of the key updates from this release. Use same format as a prior announcement https://chat.il2.dso.mil/platform-one/channels/team---big-bang
-
Cherry-pick release commit(s) as needed with merge request back to master branch. We do not ever merge release branch back to master. Instead, make a separate branch from master and cherry-pick the release commit(s) into it. Use the resulting MR to close the release issue.
# Navigate to your local clone of the BB repo cd path/to/my/clone/of/bigbang # Get the most up to date master git checkout master git pull # Check out new branch for cherrypicking git checkout -b 2.<minor>-cherrypick # Repeat the following for whichever commits are required from release # Typically this is the initial release commit (bumped GitRepo, Chart.yaml, CHANGELOG, README) and a final commit that re-ran helm-docs (if needed) git cherry-pick <commit sha for Nth commit> git push -u origin 2.<minor>-cherrypick # Create an MR using Gitlab, merging this branch back to master, reach out to maintainers to review
-
Close Big Bang Milestone in GitLab. (make sure that there are n+2 milestones still, the current one and 1 future milestone, if not contact the maintainers 1)
-
Handoff the release to the Maintainers 1, they will review then celebrate and announce the release in the public MM channel
-
Reach out to @marknelson to let him know that the release is done and to update the Gitlab Banner