Release 3.6.0
Semi-Automated Release Process
✋ STOP: Is this your first release, or has it been a while since your last release? Before proceeding, review the Getting Started Guide for instructions on configuring your environment for the release process, expectations of a release engineer, and information about the release process.
Forward
3
. It is subject to change in future major versions. For patch version upgrades, refer to the Patch Release README.
- The release branch format is as follows:
release-<major>.<minor>.x
where<minor>
is the minor release number. Example:release-2.7.x
. - The release tag format for major version
3
of Big Bang is:3.<minor>.0
.
Useful Links
- Getting Started Guide
-
Manual Release README
- The semi-automated release process relies on a suite of tools that capture most of the process in code. However, tools occasionally fail and processes may occasionally need to be performed manually.
- Patch Release README
1. Release Prep
-
Pull and build the latest version of gogru
and update your configcd your/local/gogru/repository git checkout main git pull go build ./gogru config [init | update]
-
Create the release issue in the Big Bang Umbrella package ./gogru create-issue
-
Verify the Release cluster is up and running normally, check and renew ElasticSearch (ECK) and Mattermost licenses. 📌 NOTE: Be sure toaws sso login
prior to running these commands./gogru release-cluster-check ./gogru eck-license-check ./gogru mm-license-check
- If Mattermost or ElasticSearch indicate a license was update, it's a good idea to log in with SSO and confirm functionality. (Log into Mattermost, Kibana)
- It is unlikely, but it may be necessary to reconfigure role mapping if you are unable to log into Kibana with SSO.
- If there are any issues with the Helm Releases or Pods in the output, investigate and fix them before continuing
- If Mattermost or ElasticSearch indicate a license was update, it's a good idea to log in with SSO and confirm functionality. (Log into Mattermost, Kibana)
🛑 STOP: Before creating the release branch, reach out to the anchors in the Release Engineering mattermost channel and confirm all necessary changes have been merged into the Big Bang master branch. This should be complete by 12p EST Tuesday barring extenuating circumstances.
-
Create the release branches for bigbang-sil (deployment) and bigbang (product), and verify the hash of the release branch matches the master branch ./gogru create-rb
-
Update BigBang (product) version references with the new release version ./gogru update-version-references
-
Build the release notes and carefully read all upgrade notices before proceeding - Before running the generate command, take a look at the recent MRs into Big Bang and make sure the milestone is set properly. This will ensure the generation command doesn't miss anything.
- Leave the resultant MR in draft. It will be updated later.
./gogru generate-release-notes
2. Upgrade and Debug Cluster
⚠️ WARNING: Upgrade only, never delete and redeploy.
-
Check if there was a flux upgrade, and, if so, update flux on the cluster ./gogru update-flux
-
Upgrade Release Cluster ./gogru upgrade-release-cluster
-
The
upgrade-release-cluster
command will open a merge request into bigbang-sil. Ping the package maintainers in the Release Engineering channel to approve and merge the changes. -
Once merged,
flux
should pick up the changes and begin reconcile the helm releases. This may take up to an hour depending on the size of the release. -
Monitor the
helmreleases
,gitrepositories
,pods
, andkustomizations
to track the progress of the release../gogru release-status
or
watch kubectl get gitrepositories,kustomizations,hr,po -A
-
-
Verify cluster has updated to the new release and verify that there are no outstanding issues with Pods or Helm Releases ./gogru release-cluster-check
Common Upgrade Issues
- If Flux has begun to reconcile after 10 minutes, follow the flux troubleshooting steps and refer to the workflow diagram (coming soon).
- If
Prometheus
does not start and is logging an error relating to Vault permissions, follow these instructions to reissue the Vault token. - If a package fails to upgrade, or fails a test, follow the Package Rejection Process to rollback the package upgrade.
Uncommon Upgrade Issues
- If the upgrade fails catastrophically and the release needs to be rolled back, follow the rollback instructions. This should be used only as a last resort to re-test the upgrade process.
3. UI Tests
SSO Integration
-
R2-D2: Native SSO Grafana, ArgoCD, Anchore and Mattermost Test
-
R2-D2: authservice SSO Prometheus and AlertManager Test
OR
-
Grafana & Prometheus tested below
Monitoring
AlertManager
-
Login to alertmanager with SSO -
Validate that the watchdog alert at minimum is firing
Grafana
-
Login to Grafana with SSO -
Click on the drop-down menu in the upper-left and choose Dashboards. -
Verify CoreDNS
is listed and contains data -
Verify that several Kubernetes and Istio dashboards are listed. -
Click on a few dashboards and verify that metrics are populating.
Prometheus
-
Login to Prometheus with SSO -
Go to Status > Targets and confirm that no unexpected services are marked as unhealthy -
Known Unhealthy Targets
-
vault
(the vault container is missing)
-
‼️ TROUBLESHOOTING: If Vault shows up as unhealthy, that may indicate it was rebuilt and the permissions on thevault/secrets/token
file within the prometheus pod need to be updatedUpdate Vault Token Permissions
-
exec
into thevault-agent
container on theprometheus-monitoring-monitoring-kube-prometheus-0
pod -
chmod
the vault token
chmod 644 /vault/secrets/token
-
Known Unhealthy Targets
Logging
-
Login to kibana with SSO. ‼️ TROUBLESHOOTING: ForYou do not have permission to access the requested page
errors, run theeck-license-check
as described in Release Prep, above. -
Verify that Kibana is actively indexing/logging. - To do this, click on the drop-down menu on the upper-left corner, then Under "Analytics" click Discover.
- Expand the drop-down menu to the right of
Data View
and click+ Create data view
- In the
Index pattern
field, enterlogstash*
- Verify at least one logstash source dated within the last week appears beneath All Sources
- Set the
Timestamp
field to---I don't want to use the time filter---
- Click
Use without saving
. Log data should populate.
Kiali
-
Login to kiali with SSO - Validate graphs and traces are visible under applications/workloads
-
Graphs at the overview -
Graphs for inbound metrics -
Graphs for outbound metrics -
Graphs for Traces
-
-
Validate no errors appear 📌 NOTE: Red notification bell would be visible if there are errors. Errors on individual application listings for labels, etc are expected and okay.
Twistlock
-
Log in to Prometheus - In the Targets drop down choose
serviceMonitor/twistlock/twistlock-console/0
- Click the small "show more" button that appears. You should see a State of UP and no errors.
- In the Targets drop down choose
-
R2-D2: Twistlock Test
Fortify
-
R2-D2: Fortify Test
GitLab, Gitlab Runner, & Sonarqube
Gitlab
-
Login to gitlab with SSO -
Edit profile and change user avatar. (Avatar may take several minutes to update.) -
Go to Security > Access Token
and create a personal access token. Select all scopes.- Record the token and the token name to provide during automated test
Gitlab Runner
-
R2-D2: Gitlab Test
📌 NOTE: When prompted for the username use the token name and not the name of the user that created the token.‼️ TROUBLESHOOTING: The pipeline may fail if Sonarqube is getting a major version update. Navigate to 'http://sonarqube.release.bigbang.mil/setup' and upgrade it before re-running the test
Sonarqube
-
Login to sonarqube with SSO ‼️ TROUBLESHOOTING: You will be unable to login if Sonarqube is getting a major version update. Navigate to 'http://sonarqube.release.bigbang.mil/setup' and upgrade it before proceeding -
Verify your Project says PASSED
Nexus
-
R2-D2: Nexus Test
. When asked for the release number, use a full numberic value, no alphabetical values, e.g.2.45.0
and not2.45.X
‼️ TROUBLESHOOTING: If a container from the previous release does not exist, the test will fail. Push a container with the previous release tag, either manually, or by overriding your release number in the config and running this test.
Minio
-
R2-D2: Minio Test
Mattermost
-
R2-D2: Mattermost Test
ArgoCD
-
R2-D2: ArgoCD Test
Kyverno
-
R2-D2: Kyverno Test
.
Velero
-
R2-D2: Velero Test
‼️ TROUBLESHOOTING: This test may fail due to incorrect image settings in Nexus. If so, follow the steps below to update.Nexus Image Settings Instructions for Velero
-
Login to https://nexus.release.bigbang.mil/#browse/welcome as admin. -
Confirm the following to allow the test deployment to pull the alpine image from Nexus without credentials -
Verify that the velero-tests.release.bigbang.mil/alpine:test
repository exists on Nexus -
Verify anonymous docker pull is enabled for velero-tests repository: Settings -> Repositories -> velero-tests -> Check Allow anonymous docker pull (Docker Bearer Token Realm required)
-> Save -
Go to Settings -> Security -> Realms and verify that Docker Bearer Token
is in theActive
column. If not, click on it to move it there, then clickSave
-
Verify Anonymous Access is enabled. Settings -> Security -> Anonymous Access -> Check Allow anonymous users to access the server
-> Save
-
Loki
-
R2-D2: Loki Test
bbctl
-
Login to grafana as admin. User name "admin". Retrieve password with cd into/my/release/folder/
and thensops -d deployments/release/release/secrets/environment-bb-secret.enc.yaml | sed 's/\\n/\'$\'\n/g' | grep grafana -A 2
-
Click on the drop-down on upper-left, choose Dashboards, then search for bbctl
. Ensure that all 6 dashboards are present in the search results. -
Click into the bbctl-all-logs
dashboard and confirm tables tables are populated.‼️ TROUBLESHOOTING: If any of the tables are missing data, you can run the corresponding command to trigger the job to run immediately instead of waiting for the next CRON schedule trigger.
If there is an error in thebbctl-all-logs
table for a command or data still isn't populating after manually running the commands, reach out to the bbctl team and create a bug ticket.‼️ KNOWN ISSUE: the violations table not populating is a known issue when the output is too large. You should see a red triangle in the top corner saying it timed out.Manually Trigger bbctl CRON Jobs
-
Preflight Check Logs
kubectl create job --from=cronjob/bbctl-bbctl-bigbang-preflight bbctl-bigbang-preflight -n bbctl
-
Versions Logs
kubectl create job --from=cronjob/bbctl-bbctl-bigbang-updater bbctl-bigbang-updater -n bbctl
-
Policy Logs
kubectl create job --from=cronjob/bbctl-bbctl-bigbang-policy bbctl-bigbang-policy -n bbctl
-
Status Logs
kubectl create job --from=cronjob/bbctl-bbctl-bigbang-status bbctl-bigbang-status -n bbctl
-
Violations Logs
kubectl create job --from=cronjob/bbctl-bbctl-bigbang-violations bbctl-bigbang-violations -n bbctl
-
-
Click into the other 5 bbctl dashboards and ensure they all show data in the graphs. -
bbctl-preflight-dashboard -
bbctl-version-dashboard -
bbctl-policies-dashboard -
bbctl-status-dashboard -
bbctl-violations-dashboard
-
Tempo
-
R2-D2: Tempo Test
📌 NOTE: This test is obsolete at the moment as we do not currently have any data tethered to tempo
Keycloak
-
R2-D2: Keycloak Test
Neuvector
-
R2-D2: Neuvector Test
Anchore
🛑 STOP: This test is dependent on artifacts created during the GitLab Runner and Nexus image pushes. Confirm those tests completed successfully before proceeding.
-
Log into anchore as the admin user - password is in anchore-anchore-enterprise
secret (admin will have pull credentials set up for the registries):kubectl get secret anchore-anchore-enterprise -n anchore -o json | jq -r '.data.ANCHORE_ADMIN_PASSWORD' | base64 -d
-
Scan image in release registry, registry.release.bigbang.mil/GROUPNAMEHERE/PROJECTNAMEHERE/alpine:latest
-
Go to Images Page, Analyze Tag - Registry:
registry.release.bigbang.mil
- Repository:
GROUPNAME/PROJECTNAME/alpine
- Tag:
latest
- Registry:
-
-
Scan image in nexus registry, containers.release.bigbang.mil/alpine:<release-number>
(use your release number, ex:1-XX-0
). If authentication fails that means that the Nexus credentials have changed. Retrieve the Nexus credentials using instructions from Nexus above. Update creds in Anchore by clicking on menu Images > Analyze Repository > link at bottom "clicking here" > click on registry name > then edit the credentials. -
Validate scans complete and Anchore displays data (click the SHA value for each image)
Mimir
-
Login to grafana as admin. User name "admin". Retrieve password with cd into/my/release/folder/
and thensops -d deployments/release/release/secrets/environment-bb-secret.enc.yaml | sed s/\\n/'$'\n''/g | grep grafana -A 2
(assumes gnu-sed) -
Click on the drop-down in the upper-left corner again and choose Data Sources, then choose Mimir. Scroll down and click Save & Test
. A message should appear that reads "Data source successfully connected." -
Click on the drop-down on upper-left, choose Dashboards, then Prometheus / Remote Write
. Verify that this dashboard shows data/logs from the past few minutes and hours.
Backstage
-
Log in to backstage via SSO - Navigate to
kyverno
component page and validate:-
Dashboards iFrame is populated ( ClusterPolicyReport Details
,PolicyReport Details
,PolicyReports
) and links redirect tografana.release.bigbang.mil
when clicked -
All of the header links work ( CI/CD
,Kubernetes
,API
,Dependencies
,Docs
), though they may not yet be populated with data
-
- Navigate to
monitoring
component page and validate:-
Dashboards iFrame is populated ( Alertmanager / Overview
,Prometheus / Overview
,Prometheus / Remote Write
) and links redirect tografana.release.bigbang.mil
when clicked -
Links iFrame is populated ( Monitoring
,AlertManager
,Prometheus
) and links redirect to their respective service when clicked -
All of the header links work ( CI/CD
,Kubernetes
,API
,Dependencies
,Docs
), though they may not yet be populated with data
-
- Not yet implemented - ISSUE
-
Verify components added via component.yaml are present
-
Headlamp
-
Log in to headlamp with SSO and confirm the landing page loads. 📌 NOTE: This currently immediately logs you out of headlamp due to THIS upstream issue.- Note, data will not populate until this issue is resolved
-
Log out and back in, selecting the Use A Token
option-
Generate a temporary auth token:
kubectl create token headlamp-headlamp -n headlamp
-
-
Confirm the Cluster
page is populated with graphs and events -
Confirm the Flux plugin is functioning. Select Flux
in the left sidebar and verify thatOverview
,Kustomizations
,HelmReleases
,Sources
, andFlux Runtime
are populated with data📌 NOTE: there is currently an error with the flux plugin working with headlamp, most of these pages arent displaying data. -
Click through a few of the other menu items in the left sidebar and confirm data is populating ( Gateway (beta)
errors are okay)
4. Create Release
🛑 STOP: Before finalizing the release, confirm that the latest bb-docs-compiler pipeline passed. If not, identify, correct, and cherry-pick a fix into the release branch. This will prevent wasting time re-running pipelines if the docs compiler pipeline fails.
Prep
-
Re-generate the version references, helm docs, and release notes to capture any changes implemented during testing. ./gogru update-version-references
./gogru generate-release-notes
Clean up release notes
-
Check to see if any packages have been added or graduated from BETA. - There is no single resource that lists this information, but you should generally be aware of packages that have been added or moved out of BETA from discussions with other team members during your workdays. You can also ask the Maintainers 1 if unsure. If any packages have been added or moved out of BETA,
adjust your local R2D2 config as needed. Also open an MR for R2D2 with the change to the default config.[DEPRECATED,gogru
instructions coming soon.]
- There is no single resource that lists this information, but you should generally be aware of packages that have been added or moved out of BETA from discussions with other team members during your workdays. You can also ask the Maintainers 1 if unsure. If any packages have been added or moved out of BETA,
-
Add Upgrade Notices - Issues encountered during testing often result in upgrade notices
- Package MRs might be missed by the bot if the notice includes
## Header Tags
-
Review MRs improperly sorted under 'Big Bang MRs' into their specific sections below. - If MR is for Big Bang (docs, CI, etc) and not a specific package, they can remain in place under 'Big Bang MRs'.
- General rule of thumb: If the git tag changes for a package, put the MR under that package, and if there is no git tag change for a package, put it under BB.
-
Adjust known issues as needed: - If an issue has been resolved it can be removed and if any new issues were discovered they should be added.
-
Verify table contains all chart upgrades mentioned in the MR list below. -
Verify table contains all application versions. - Compare with previous release notes. If a package was upgraded there will usually be a bump in application version and chart version in the table.
-
Verify all internal comments are removed from Release Notes, such as comments directing the release engineer to copy/move things around -
Any issues encountered during release upgrade testing need to be thoroughly documented in the release notes. If you had to delete a replica set, force restart some pods, delete a PV, please make sure these steps are included in our known issues section of the release notes. -
Scroll through every package listed and verify the following 1. it has a link to at least one MR 2. the CHANGELOG entries match the MR listed 3. if more than one MR or CHANGELOG entry for a package exists then verify it makes sense (ie if 2 MRs that there are CHANGELOG entries for both. Conversely, if more than one CHANGELOG entry make sure those entries are tied to the MRs in this release, sometimes changlogs get messed up causing more entries in the release notes than neccessary) -
Publish Release Notes to Bigbang-ci
-
Commit the resulting release notes and push to <release-branch>
in Big Bang (bigbang-ci) -
Create an MR for the Maintainers 1 and have it merged before continuing on
-
Create Release Tag
-
Create release tag based on release branch. ie. 2.<minor>.0
.-
To do this via the UI (generally preferred): tags page -> new tag, name:
2.<minor>.0
, create from:release-2.<minor>.x
, message:release 2.<minor>.0
, release notes: leave empty -
To do this via git CLI:
git tag -a 2.<minor>.0 -m "release 2.<minor>.0" # list the tags to make sure you made the correct one git tag -n # push git push origin 2.<minor>.0
-
Confirm All Jobs Pass
-
Passed release pipeline. - Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. Maintainers 1
-
Release Notes are present -
BB Docs Compiler pipeline passed. - Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. Maintainers 1
Update Docs
-
Manually initiate a Weekly Docs Gen (tag)
andNightly Docs Gen (latest)
pipeline within bb-docs-compiler-
once those pipelines complete, Verify that they have been mirrored to the IL2 Big Bang repository. 📌 NOTE: You may require access be granted to the IL2 Big Bang repository to view the status. Contact the Maintainers 1 for assistance. -
verify IL2 pipelines, complete successfully IL2 Big Bang Pipelines (this will update the https://docs-bigbang.dso.mil/ website)
-
- Ensure that release docs have compiled and appear on Big Bang Docs once the pipeline is complete.
-
In Big Bang Docs select Latest version and go in Packages, make sure that the packages version is updated to match the version specified in the release -
Select a couple of packages that were updated in the release and match the version in the values file. -
Click around in the docs and make sure docs are properly formatted, pay special attention to lists and tables.
-
RELEASE IT
-
Type up a release announcement for the Maintainers 1 to post in the Big Bang Value Stream channel. Mention any breaking changes and some of the key updates from this release. Use same format as a prior announcement https://chat.il2.dso.mil/platform-one/channels/team---big-bang -
Cherry-pick release commit(s) as needed with merge request back to master branch. We do not ever merge release branch back to master. Instead, make a separate branch from master and cherry-pick the release commit(s) into it. Use the resulting MR to close the release issue. # Navigate to your local clone of the BB repo cd path/to/my/clone/of/bigbang # Get the most up to date master git checkout master git pull # Check out new branch for cherrypicking git checkout -b 2.<minor>-cherrypick # Repeat the following for whichever commits are required from release # Typically this is the initial release commit (bumped GitRepo, Chart.yaml, CHANGELOG, README) and a final commit that re-ran helm-docs (if needed) git cherry-pick <commit sha for Nth commit> git push -u origin 2.<minor>-cherrypick # Create an MR using Gitlab, merging this branch back to master, reach out to maintainers to review
-
Move all OPEN Issues remaining within the current milestone to the next one. -
Close Big Bang Milestone in GitLab. (make sure that there are n+2 milestones still, the current one and 1 future milestone, if not contact the maintainers 1) -
Handoff the release to the Maintainers 1, they will review then celebrate and announce the release in the public MM channel -
Reach out to @marknelson to let him know that the release is done and to update the Gitlab Banner -
📌 NOTE: if release notes need to be edited AFTER a release tag & pipeline have completed. make sure that the following steps are taken