UNCLASSIFIED - NO CUI

Skip to content

Release 3.6.0

Semi-Automated Release Process

STOP: Is this your first release, or has it been a while since your last release? Before proceeding, review the Getting Started Guide for instructions on configuring your environment for the release process, expectations of a release engineer, and information about the release process.

Forward

‼️ This upgrade process is applicable only to minor Big Bang version upgrades of Big Bang major version 3. It is subject to change in future major versions. For patch version upgrades, refer to the Patch Release README.

  • The release branch format is as follows: release-<major>.<minor>.x where <minor> is the minor release number. Example: release-2.7.x.
  • The release tag format for major version 3 of Big Bang is: 3.<minor>.0.

DO NOT blindly copy and paste. Carefully read and evaluate the commands below and execute them, if applicable.

📌 As you progress, document pain points and unclear or out-of-date steps. Once the release is complete, submit an MR to update this documentation or post feedback in the Release Engineering mattermost channel.

Useful Links

1. Release Prep

  • Pull and build the latest version of gogru and update your config

      cd your/local/gogru/repository
      git checkout main
      git pull
      go build
      ./gogru config [init | update]
  • Create the release issue in the Big Bang Umbrella package

      ./gogru create-issue
  • Verify the Release cluster is up and running normally, check and renew ElasticSearch (ECK) and Mattermost licenses.

    📌 NOTE: Be sure to aws sso login prior to running these commands

      ./gogru release-cluster-check
      ./gogru eck-license-check
      ./gogru mm-license-check
    • If Mattermost or ElasticSearch indicate a license was update, it's a good idea to log in with SSO and confirm functionality. (Log into Mattermost, Kibana)
    • If there are any issues with the Helm Releases or Pods in the output, investigate and fix them before continuing

🛑 STOP: Before creating the release branch, reach out to the anchors in the Release Engineering mattermost channel and confirm all necessary changes have been merged into the Big Bang master branch. This should be complete by 12p EST Tuesday barring extenuating circumstances.

  • Create the release branches for bigbang-sil (deployment) and bigbang (product), and verify the hash of the release branch matches the master branch

      ./gogru create-rb
  • Update BigBang (product) version references with the new release version

      ./gogru update-version-references
  • Build the release notes and carefully read all upgrade notices before proceeding

    • Before running the generate command, take a look at the recent MRs into Big Bang and make sure the milestone is set properly. This will ensure the generation command doesn't miss anything.
    • Leave the resultant MR in draft. It will be updated later.
      ./gogru generate-release-notes

2. Upgrade and Debug Cluster

⚠️ WARNING: Upgrade only, never delete and redeploy.

  • Check if there was a flux upgrade, and, if so, update flux on the cluster

      ./gogru update-flux
  • Upgrade Release Cluster

      ./gogru upgrade-release-cluster
    • The upgrade-release-cluster command will open a merge request into bigbang-sil. Ping the package maintainers in the Release Engineering channel to approve and merge the changes.

    • Once merged, flux should pick up the changes and begin reconcile the helm releases. This may take up to an hour depending on the size of the release.

    • Monitor the helmreleases, gitrepositories, pods, and kustomizations to track the progress of the release.

        ./gogru release-status

      or

        watch kubectl get gitrepositories,kustomizations,hr,po -A
  • Verify cluster has updated to the new release and verify that there are no outstanding issues with Pods or Helm Releases

      ./gogru release-cluster-check

Common Upgrade Issues

  • If Flux has begun to reconcile after 10 minutes, follow the flux troubleshooting steps and refer to the workflow diagram (coming soon).
  • If Prometheus does not start and is logging an error relating to Vault permissions, follow these instructions to reissue the Vault token.
  • If a package fails to upgrade, or fails a test, follow the Package Rejection Process to rollback the package upgrade.

Uncommon Upgrade Issues

  • If the upgrade fails catastrophically and the release needs to be rolled back, follow the rollback instructions. This should be used only as a last resort to re-test the upgrade process.

3. UI Tests

📌 Utilize the Default Application Credentials when accessing services without SSO

‼️ Refer to the Manual Test Steps to troubleshoot failing tests.

⚠️ Follow the Package Rejection Process to rollback package upgrades for failing packages.

SSO Integration

  • R2-D2: Native SSO Grafana, ArgoCD, Anchore and Mattermost Test

  • R2-D2: authservice SSO Prometheus and AlertManager Test

    OR

  • ArgoCD

  • Anchore

  • Mattermost

  • Grafana & Prometheus tested below

Monitoring

AlertManager

  • Login to alertmanager with SSO
  • Validate that the watchdog alert at minimum is firing

Grafana

  • Login to Grafana with SSO
  • Click on the drop-down menu in the upper-left and choose Dashboards.
  • Verify CoreDNS is listed and contains data
  • Verify that several Kubernetes and Istio dashboards are listed.
  • Click on a few dashboards and verify that metrics are populating.

Prometheus

  • Login to Prometheus with SSO

  • Go to Status > Targets and confirm that no unexpected services are marked as unhealthy

    • Known Unhealthy Targets
      • vault (the vault container is missing)

    ‼️ TROUBLESHOOTING: If Vault shows up as unhealthy, that may indicate it was rebuilt and the permissions on the vault/secrets/token file within the prometheus pod need to be updated

    Update Vault Token Permissions
    • exec into the vault-agent container on the prometheus-monitoring-monitoring-kube-prometheus-0 pod
    • chmod the vault token
      chmod 644 /vault/secrets/token

Logging

  • Login to kibana with SSO.

    ‼️ TROUBLESHOOTING: For You do not have permission to access the requested page errors, run the eck-license-check as described in Release Prep, above.

  • Verify that Kibana is actively indexing/logging.
    • To do this, click on the drop-down menu on the upper-left corner, then Under "Analytics" click Discover.
    • Expand the drop-down menu to the right of Data View and click + Create data view
    • In the Index pattern field, enter logstash*
    • Verify at least one logstash source dated within the last week appears beneath All Sources
    • Set the Timestamp field to ---I don't want to use the time filter---
    • Click Use without saving. Log data should populate.

Kiali

  • Login to kiali with SSO
  • Validate graphs and traces are visible under applications/workloads
  • Validate no errors appear

    📌 NOTE: Red notification bell would be visible if there are errors. Errors on individual application listings for labels, etc are expected and okay.

Twistlock

  • Log in to Prometheus
    • In the Targets drop down choose serviceMonitor/twistlock/twistlock-console/0
    • Click the small "show more" button that appears. You should see a State of UP and no errors.
  • R2-D2: Twistlock Test

Fortify

  • R2-D2: Fortify Test

GitLab, Gitlab Runner, & Sonarqube

Gitlab

  • Login to gitlab with SSO
  • Edit profile and change user avatar. (Avatar may take several minutes to update.)
  • Go to Security > Access Token and create a personal access token. Select all scopes.
    • Record the token and the token name to provide during automated test

Gitlab Runner

  • R2-D2: Gitlab Test

    📌 NOTE: When prompted for the username use the token name and not the name of the user that created the token.

    ‼️ TROUBLESHOOTING: The pipeline may fail if Sonarqube is getting a major version update. Navigate to 'http://sonarqube.release.bigbang.mil/setup' and upgrade it before re-running the test

Sonarqube

Nexus

  • R2-D2: Nexus Test. When asked for the release number, use a full numberic value, no alphabetical values, e.g. 2.45.0 and not 2.45.X

    ‼️ TROUBLESHOOTING: If a container from the previous release does not exist, the test will fail. Push a container with the previous release tag, either manually, or by overriding your release number in the config and running this test.

Minio

  • R2-D2: Minio Test

Mattermost

  • R2-D2: Mattermost Test

ArgoCD

  • R2-D2: ArgoCD Test

Kyverno

  • R2-D2: Kyverno Test.

Velero

  • R2-D2: Velero Test

    ‼️ TROUBLESHOOTING: This test may fail due to incorrect image settings in Nexus. If so, follow the steps below to update.

    Nexus Image Settings Instructions for Velero
    • Login to https://nexus.release.bigbang.mil/#browse/welcome as admin.
    • Confirm the following to allow the test deployment to pull the alpine image from Nexus without credentials
    • Verify that the velero-tests.release.bigbang.mil/alpine:test repository exists on Nexus
    • Verify anonymous docker pull is enabled for velero-tests repository: Settings -> Repositories -> velero-tests -> Check Allow anonymous docker pull (Docker Bearer Token Realm required) -> Save
    • Go to Settings -> Security -> Realms and verify that Docker Bearer Token is in the Active column. If not, click on it to move it there, then click Save
    • Verify Anonymous Access is enabled. Settings -> Security -> Anonymous Access -> Check Allow anonymous users to access the server -> Save

Loki

  • R2-D2: Loki Test

bbctl

  • Login to grafana as admin. User name "admin". Retrieve password with cd into/my/release/folder/ and then sops -d deployments/release/release/secrets/environment-bb-secret.enc.yaml | sed 's/\\n/\'$\'\n/g' | grep grafana -A 2

  • Click on the drop-down on upper-left, choose Dashboards, then search for bbctl. Ensure that all 6 dashboards are present in the search results.

  • Click into the bbctl-all-logs dashboard and confirm tables tables are populated.

    ‼️ TROUBLESHOOTING: If any of the tables are missing data, you can run the corresponding command to trigger the job to run immediately instead of waiting for the next CRON schedule trigger.
    If there is an error in the bbctl-all-logs table for a command or data still isn't populating after manually running the commands, reach out to the bbctl team and create a bug ticket. ‼️ KNOWN ISSUE: the violations table not populating is a known issue when the output is too large. You should see a red triangle in the top corner saying it timed out.

    Manually Trigger bbctl CRON Jobs
    • Preflight Check Logs

        kubectl create job --from=cronjob/bbctl-bbctl-bigbang-preflight bbctl-bigbang-preflight -n bbctl
    • Versions Logs

        kubectl create job --from=cronjob/bbctl-bbctl-bigbang-updater bbctl-bigbang-updater -n bbctl
    • Policy Logs

        kubectl create job --from=cronjob/bbctl-bbctl-bigbang-policy bbctl-bigbang-policy -n bbctl
    • Status Logs

        kubectl create job --from=cronjob/bbctl-bbctl-bigbang-status bbctl-bigbang-status -n bbctl
    • Violations Logs

        kubectl create job --from=cronjob/bbctl-bbctl-bigbang-violations bbctl-bigbang-violations -n bbctl
  • Click into the other 5 bbctl dashboards and ensure they all show data in the graphs.

    • bbctl-preflight-dashboard
    • bbctl-version-dashboard
    • bbctl-policies-dashboard
    • bbctl-status-dashboard
    • bbctl-violations-dashboard

Tempo

  • R2-D2: Tempo Test

    📌 NOTE: This test is obsolete at the moment as we do not currently have any data tethered to tempo

Keycloak

  • R2-D2: Keycloak Test

Neuvector

  • R2-D2: Neuvector Test

Anchore

🛑 STOP: This test is dependent on artifacts created during the GitLab Runner and Nexus image pushes. Confirm those tests completed successfully before proceeding.

  • Log into anchore as the admin user - password is in anchore-anchore-enterprise secret (admin will have pull credentials set up for the registries):

      kubectl get secret anchore-anchore-enterprise -n anchore -o json | jq -r '.data.ANCHORE_ADMIN_PASSWORD' | base64 -d
  • Scan image in release registry, registry.release.bigbang.mil/GROUPNAMEHERE/PROJECTNAMEHERE/alpine:latest

    • Go to Images Page, Analyze Tag
      • Registry: registry.release.bigbang.mil
      • Repository: GROUPNAME/PROJECTNAME/alpine
      • Tag: latest
  • Scan image in nexus registry, containers.release.bigbang.mil/alpine:<release-number> (use your release number, ex: 1-XX-0). If authentication fails that means that the Nexus credentials have changed. Retrieve the Nexus credentials using instructions from Nexus above. Update creds in Anchore by clicking on menu Images > Analyze Repository > link at bottom "clicking here" > click on registry name > then edit the credentials.

  • Validate scans complete and Anchore displays data (click the SHA value for each image)

Mimir

  • Login to grafana as admin. User name "admin". Retrieve password with cd into/my/release/folder/ and then sops -d deployments/release/release/secrets/environment-bb-secret.enc.yaml | sed s/\\n/'$'\n''/g | grep grafana -A 2 (assumes gnu-sed)
  • Click on the drop-down in the upper-left corner again and choose Data Sources, then choose Mimir. Scroll down and click Save & Test. A message should appear that reads "Data source successfully connected."
  • Click on the drop-down on upper-left, choose Dashboards, then Prometheus / Remote Write. Verify that this dashboard shows data/logs from the past few minutes and hours.

Backstage

  • Log in to backstage via SSO
  • Navigate to kyverno component page and validate:
    • Dashboards iFrame is populated (ClusterPolicyReport Details, PolicyReport Details, PolicyReports) and links redirect to grafana.release.bigbang.mil when clicked
    • All of the header links work (CI/CD, Kubernetes, API, Dependencies, Docs), though they may not yet be populated with data
  • Navigate to monitoring component page and validate:
    • Dashboards iFrame is populated (Alertmanager / Overview, Prometheus / Overview, Prometheus / Remote Write) and links redirect to grafana.release.bigbang.mil when clicked
    • Links iFrame is populated (Monitoring, AlertManager, Prometheus) and links redirect to their respective service when clicked
    • All of the header links work (CI/CD, Kubernetes, API, Dependencies, Docs), though they may not yet be populated with data
  • Not yet implemented - ISSUE
    • Verify components added via component.yaml are present

Headlamp

  • Log in to headlamp with SSO and confirm the landing page loads. 📌 NOTE: This currently immediately logs you out of headlamp due to THIS upstream issue.

    • Note, data will not populate until this issue is resolved
  • Log out and back in, selecting the Use A Token option

    • Generate a temporary auth token:

      kubectl create token headlamp-headlamp -n headlamp
  • Confirm the Cluster page is populated with graphs and events

  • Confirm the Flux plugin is functioning. Select Flux in the left sidebar and verify that Overview, Kustomizations, HelmReleases, Sources, and Flux Runtime are populated with data 📌 NOTE: there is currently an error with the flux plugin working with headlamp, most of these pages arent displaying data.

  • Click through a few of the other menu items in the left sidebar and confirm data is populating (Gateway (beta) errors are okay)

4. Create Release

🛑 STOP: Before finalizing the release, confirm that the latest bb-docs-compiler pipeline passed. If not, identify, correct, and cherry-pick a fix into the release branch. This will prevent wasting time re-running pipelines if the docs compiler pipeline fails.

Prep

  • Re-generate the version references, helm docs, and release notes to capture any changes implemented during testing.

      ./gogru update-version-references
      ./gogru generate-release-notes

Clean up release notes

  • Check to see if any packages have been added or graduated from BETA.
    • There is no single resource that lists this information, but you should generally be aware of packages that have been added or moved out of BETA from discussions with other team members during your workdays. You can also ask the Maintainers 1 if unsure. If any packages have been added or moved out of BETA, adjust your local R2D2 config as needed. Also open an MR for R2D2 with the change to the default config. [DEPRECATED, gogru instructions coming soon.]
  • Add Upgrade Notices
    • Issues encountered during testing often result in upgrade notices
    • Package MRs might be missed by the bot if the notice includes ## Header Tags
  • Review MRs improperly sorted under 'Big Bang MRs' into their specific sections below.
    • If MR is for Big Bang (docs, CI, etc) and not a specific package, they can remain in place under 'Big Bang MRs'.
    • General rule of thumb: If the git tag changes for a package, put the MR under that package, and if there is no git tag change for a package, put it under BB.
  • Adjust known issues as needed:
    • If an issue has been resolved it can be removed and if any new issues were discovered they should be added.
  • Verify table contains all chart upgrades mentioned in the MR list below.
  • Verify table contains all application versions.
    • Compare with previous release notes. If a package was upgraded there will usually be a bump in application version and chart version in the table.
  • Verify all internal comments are removed from Release Notes, such as comments directing the release engineer to copy/move things around
  • Any issues encountered during release upgrade testing need to be thoroughly documented in the release notes. If you had to delete a replica set, force restart some pods, delete a PV, please make sure these steps are included in our known issues section of the release notes.
  • Scroll through every package listed and verify the following 1. it has a link to at least one MR 2. the CHANGELOG entries match the MR listed 3. if more than one MR or CHANGELOG entry for a package exists then verify it makes sense (ie if 2 MRs that there are CHANGELOG entries for both. Conversely, if more than one CHANGELOG entry make sure those entries are tied to the MRs in this release, sometimes changlogs get messed up causing more entries in the release notes than neccessary)
  • Publish Release Notes to Bigbang-ci
    • Commit the resulting release notes and push to <release-branch> in Big Bang (bigbang-ci)
    • Create an MR for the Maintainers 1 and have it merged before continuing on

Create Release Tag

  • Create release tag based on release branch. ie. 2.<minor>.0.
    • To do this via the UI (generally preferred): tags page -> new tag, name: 2.<minor>.0, create from: release-2.<minor>.x, message: release 2.<minor>.0, release notes: leave empty

    • To do this via git CLI:

        git tag -a 2.<minor>.0 -m "release 2.<minor>.0"
        # list the tags to make sure you made the correct one
        git tag -n
        # push
        git push origin 2.<minor>.0

Confirm All Jobs Pass

  • Passed release pipeline.
    • Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. Maintainers 1
  • Release Notes are present
  • BB Docs Compiler pipeline passed.
    • Review all pipeline output looking for failures or warnings. Reach out to the maintainers for a quick review before proceeding. Maintainers 1

Update Docs

  • Manually initiate a Weekly Docs Gen (tag) and Nightly Docs Gen (latest) pipeline within bb-docs-compiler
  • Ensure that release docs have compiled and appear on Big Bang Docs once the pipeline is complete.
    • In Big Bang Docs select Latest version and go in Packages, make sure that the packages version is updated to match the version specified in the release
    • Select a couple of packages that were updated in the release and match the version in the values file.
    • Click around in the docs and make sure docs are properly formatted, pay special attention to lists and tables.

RELEASE IT

  • Type up a release announcement for the Maintainers 1 to post in the Big Bang Value Stream channel. Mention any breaking changes and some of the key updates from this release. Use same format as a prior announcement https://chat.il2.dso.mil/platform-one/channels/team---big-bang

  • Cherry-pick release commit(s) as needed with merge request back to master branch. We do not ever merge release branch back to master. Instead, make a separate branch from master and cherry-pick the release commit(s) into it. Use the resulting MR to close the release issue.

    # Navigate to your local clone of the BB repo
    cd path/to/my/clone/of/bigbang
    # Get the most up to date master
    git checkout master
    git pull
    # Check out new branch for cherrypicking
    git checkout -b 2.<minor>-cherrypick
    
    # Repeat the following for whichever commits are required from release
    # Typically this is the initial release commit (bumped GitRepo, Chart.yaml, CHANGELOG, README) and a final commit that re-ran helm-docs (if needed)
    git cherry-pick <commit sha for Nth commit>
    
    git push -u origin 2.<minor>-cherrypick
    # Create an MR using Gitlab, merging this branch back to master, reach out to maintainers to review
  • Move all OPEN Issues remaining within the current milestone to the next one.

  • Close Big Bang Milestone in GitLab. (make sure that there are n+2 milestones still, the current one and 1 future milestone, if not contact the maintainers 1)

  • Handoff the release to the Maintainers 1, they will review then celebrate and announce the release in the public MM channel

  • Reach out to @marknelson to let him know that the release is done and to update the Gitlab Banner

  • 📌 NOTE: if release notes need to be edited AFTER a release tag & pipeline have completed. make sure that the following steps are taken

    • edit the release manually, here (you might need to add yourself to the Tag Permissions within the settings to allow the Edit release button to appear)
    • Create an MR Here with the edits to the previous release in order to ensure possible patch releases get all relevant update notices.
  1. @jfoster @chris.oconnell @michaelmartin @andrewshoell 2 3 4 5 6 7 8

Edited by Matt Goloski