UNCLASSIFIED - NO CUI

Release 1.34.0

Copy contents of https://repo1.dso.mil/platform-one/big-bang/customers/bigbang/-/blob/master/docs/release/README.md into this description and check the boxes as you test

Also note the comments in this issue. Generally these include release notes to add in.

Release Process

Forward

  • Commands prepended in $ should not be blindly copied and pasted, they should be read, evalulated, and if applicable, executed.
  • This upgrade process is applicable only to minor version upgrades on Big Bang major version 1, and subject to change in future major versions. Patch version upgrades follow a different process.
  • The R2D2 release automation tool can be used to automate certain parts of this release process. Follow the README for cloning and installation. Usage is denoted by the R2-D2 prefix.
  • The release branch format is as follows: release-1.<minor>.x where <minor> is the minor release number. Example: release-1-32-x. The <minor> string is used as a placeholder for the minor release number in this document.
  • The release tag format is as follows: 1.<minor>.0.

1. Release Prep

  • Check last release SHAs

    • Verify that the previous release branch commit hash matches the last release tag. Investigate with previous RE if they do not match.
    • Example: Last release is 1.31.1, validate the commit sha for the 1.31.1 tag matches the commit sha for the release-1-31-x branch.
    • R2-D2: run w/ the Check last release SHAs option selected
  • Create release branch

    • R2-D2: run w/ the Create release branch option selected

      or

    • In either the Gitlab UI or the Git CLI, create a new branch from master with the name release-1.<minor>.x.

      Important The release branch name must end with x

      # CLI example, replace `<release-branch>` with the name of the release branch as specifed above
      $ cd bigbang
      $ git checkout master
      $ git pull
      $ git checkout -b <release-branch>
      $ git branch --show-current
      $ git push --set-upstream origin <release-branch>
  • Build release notes

    • R2-D2: run w/ the Build release notes option selected

      or

      # clone the dogfood cluster repo
      $ git clone https://repo1.dso.mil/platform-one/big-bang/customers/bigbang/ dogfood-bigbang
      # build the release notes
      $ cd R2-D2
      # Select the `Build release notes` option
      $ python3 main.py
      # check the output
      $ cat ./build/*.md
      # copy to dogfood release docs
      $ cp ./build/*.md ../dogfood-bigbang/docs/release/
      # commit the release notes
      $ cd ../dogfood-bigbang
      $ git add docs/release/
      $ git commit -m "add initial release notes"
  • Upgrade Big Bang's version references

    • WIP: R2-D2: run w/ the Upgrade version references option selected

      or

      💡 Tip Make the following changes in a single commit so it can be cherry picked into master later.

    • Bump self-reference version in base/gitrepository.yaml

    • Update chart release version chart/Chart.yaml

    • Update /Packages.md with any new Packages. Also review if any packages have enabled mTLS STRICT this release and update the mTLS column with the BB MR link (follow the other examples where STRICT is noted).

    • Add a new changelog entry for the release, ex:

      ## [1.X.Y]
      
      - [!1.X.Y](https://repo1.dso.mil/platform-one/big-bang/bigbang/-/merge_requests?scope=all&utf8=%E2%9C%93&state=merged&milestone_title=1.X.Y); List of merge requests in this release.
      
      <!-- Note: milestone_title=1.X.Y version must match the given minor release version -->
    • Update README.md using helm-docs.

      # example release 1.<minor>.x
      $ cd bigbang
      $ git checkout release-1.<minor>.x
      $ docker run -v "$(pwd):/helm-docs" -u $(id -u) jnorwood/helm-docs:v1.5.0 -s file -t .gitlab/README.md.gotmpl --dry-run > README.md

2. Upgrade and Debug Cluster

WARNING: Upgrade only, do not delete and redeploy.

  • Connect to the dogfood cluster

    📌 NOTE: YMMV using the aws cli to add your Public IP to the public bastion host. Adding via the AWS web console is another option. Reach out to core maintainers for assistance. 1

  • Review Elasticsearch health and trial license status:

    • Check the age of the pods, in general if younger than 30 days you should be good. Also login to Kibana via SSO - SSO is paywalled so it will fail if the license is expired.

      Renew ECK Trial
      kubectl delete hr ek eck-operator fluentbit cluster-auditor -n bigbang
      kubectl delete ns eck-operator logging
      flux reconcile kustomization environment -n bigbang
      flux suspend hr bigbang -n bigbang
      flux resume hr bigbang -n bigbang
  • Review Mattermost Enterprise trial license status & follow these steps if expired:

    • Login as a system admin, navigate to the System Console -> Edition and License tab. Reach out to core maintainers for assistance. 1

      Renew Mattermost Enterprise Trial

      To "renew" Mattermost Enterprise trial license, connect to RDS postgres DB using psql. Reach out to core maintainers for assistance. 1

      \c mattermost
      select * from "public"."licenses";
      delete from "public"."licenses";
      \q
      kubectl delete mattermost mattermost -n mattermost
  • If Flux has been updated in the latest release:

    • Run the Flux install script as in the example below

      $ cd bigbang
      $ git checkout release-1.<minor>.x
      $ git pull
      $ ./scripts/install_flux.sh -s
      # the `-s` option will reuse the existing secret so you don't have to provide credentials
      $ cd ../dogfood-bigbang
      # go back to dogfood after flux upgrade
  • Upgrade the release branch on dogfood cluster master by completing the following.

    • WIP: R2-D2: run w/ the Upgrade dogfood cluster option selected

      or

    • Upgrade base kustomization ref=release-1.<minor>.x in bigbang/base/kustomization.yaml to the release branch.

    • Upgrade prod kustomization branch: "release-1.<minor>.x" in bigbang/prod/kustomization.yaml to the release branch.

    • Verify the above changes are correct, then:

      $ git add bigbang/base bigbang/prod
      $ git commit -m "upgrade kustomizations to release-1.<minor>.x"
      $ git push
  • Verify cluster has updated to the new release

    • Packages have fetched the new revision and match the new release

    • Packages have reconciled

      • Watch the Release HRs, Gitrepos, and Kustomizations to check when all HRs have properly reconciled

        # check release
        watch kubectl get gitrepositories,kustomizations,hr,po -A
      • If flux has not updated after ten minutes:

        flux reconcile hr -n bigbang bigbang --with-source
      • If flux is still not updating, delete the flux source controller:

        kubectl get all -n flux-system
        kubectl delete pod/source-controller-xxxxxxxx-xxxxx -n flux-system
      • If the helm release shows max retries exhausted, check a describe of the HR. If it shows "another operation (install/upgrade/rollback) is in progress", this is an issue caused by too many Flux reconciles happening at once and typically the Helm controller crashing. You will need to delete helm release secrets and reconcile in flux as follows. Note that ${HR_NAME} is the same HR you described which is in a bad state (typically a specific package and NOT bigbang itself).

        # example w/ kiali
        $ HR_NAME=kiali
        $ kubectl get secrets -n bigbang | grep ${HR_NAME}
        # example output:
        # some hr names are duplicated w/ a dash in the middle, some are not
        sh.helm.release.v1.kiali-kiali.v1                                       helm.sh/release.v1                    1      18h
        sh.helm.release.v1.kiali-kiali.v2                                       helm.sh/release.v1                    1      17h
        sh.helm.release.v1.kiali-kiali.v3                                       helm.sh/release.v1                    1      17m
        # Delete the latest one:
        $ kubectl delete secret -n bigbang sh.helm.release.v1.${HR_NAME}-${HR_NAME}.v3
        
        # suspend/resume the hr
        $ flux suspend hr -n bigbang ${HR_NAME}
        $ flux resume hr -n bigbang ${HR_NAME}

3. UI Tests

Important When verifying each application UI is loading, also verify the website certificates are valid.

Logging

  • Login to kibana with SSO
  • Kibana is actively indexing/logging. link

Monitoring

  • Login to grafana with SSO
  • Contains Kubernetes Dashboards and metrics
  • contains Istio dashboards
  • Login to prometheus
  • All apps are being scraped, no errors (expected unhealthy targets are: promtail, kube-controller-manager, kube-etcd, kube-scheduler)

Cluster Auditor

  • Login to grafana with SSO
  • OPA Violations dashboard is present and shows violations in namespaces

Kiali

  • Login to kiali with SSO
  • Validate graphs and traces are visible under applications/workloads
  • Validate no errors appear

    Note Red notification bell would be visible if there are errors. Errors on individual application listings for labels, etc are expected and OK.

GitLab

  • Login to gitlab with SSO
  • Edit profile and change user avatar
  • Create new public group with release name. ie. release-1-<minor>-x
  • Create new public project (under the group you just made), also with release name. ie. release-1-<minor>-x
  • git clone project
  • Pick one of the project folders from Sonarqube Samples and copy all the files into your clone from dogfood, then push up
  • docker push and docker pull image to/from registry
    docker pull alpine
    docker tag alpine registry.dogfood.bigbang.dev/<GROUPNAMEHERE>/<PROJECTNAMEHERE>/alpine:latest
    docker login registry.dogfood.bigbang.dev
    docker push registry.dogfood.bigbang.dev/<GROUPNAMEHERE>/<PROJECTNAMEHERE>/alpine:latest

Sonarqube

  • Login to sonarqube with SSO
  • Add a project for your release
  • Generate a token for the project and copy the token somewhere safe for use later
  • Click other, linux, and copy the projectKey from -Dsonar.projectKey=XXXXXXX for use later
  • After completing the gitlab runner test return to sonar and check that your project now has analysis

    Note The project token and project key are different values.

Gitlab Runner

  • Log back into gitlab and navigate to your project
  • Under settings, CI/CD, variables add two vars:
    • SONAR_HOST_URL set equal to https://sonarqube.dogfood.bigbang.dev/
    • SONAR_TOKEN set equal to the token you copied from Sonarqube earlier (make this masked)
  • Add a .gitlab-ci.yml file to the root of the project, paste in the contents of sample_ci.yaml, replacing -Dsonar.projectKey=XXXXXXX with what you copied earlier
  • Commit, validate the pipeline runs and succeeds (may need to retry if there is a connection error), then return to the last step of the sonar test

Nexus

  • Login to Nexus as admin, password is in the nexus-repository-manager-secret secret:
    # looks like the username but not the pw might be stored with a newline, hence the ^ instead of <-
    kubectl get secret nexus-repository-manager-secret -n nexus-repository-manager -o json | jq -r '.data["admin.username"]' | base64 -d ; echo ' ^ admin username'
    kubectl get secret nexus-repository-manager-secret -n nexus-repository-manager -o json | jq -r '.data["admin.password"]' | base64 -d ; echo ' <- admin password'
  • Validate there are no errors displaying in the UI
  • Push/pull an image to/from the nexus registry
    • With the credentials from the encrypted values (or the admin user credentials) login to the nexus registry
      $ docker login containers.dogfood.bigbang.dev
    • Tag and push an image to the registry:
      # ex: <release> = `1-32-0`
      $ docker tag alpine:latest containers.dogfood.bigbang.dev/alpine:<release>
      $ docker push containers.dogfood.bigbang.dev/alpine:<release>
    • Pull down the image for the previous release
      # ex: <last-release> = `1-31-0`
      $ docker pull containers.dogfood.bigbang.dev/alpine:<last-release>

Anchore

  • Login to Anchore with SSO
  • Log out and log back in as the admin user - password is in anchore-anchore-engine-admin-pass secret (admin will have pull credentials set up for the registries):
    kubectl get secret anchore-anchore-engine-admin-pass -n anchore -o json | jq -r '.data.ANCHORE_ADMIN_PASSWORD' | base64 -d ; echo ' <- password'
  • Scan image in dogfood registry, registry.dogfood.bigbang.dev/GROUPNAMEHERE/PROJECTNAMEHERE/alpine:latest
  • Scan image in nexus registry, containers.dogfood.bigbang.dev/alpine:<release-number> (use your release number, ex: 1-32-0)
  • Validate scans complete and Anchore displays data (click the SHA value for each image)

Argocd

  • Login to argocd with SSO
  • Logout and login with username admin. The password is in the argocd-initial-admin-secret secret. If that doesn't work attempt a password reset:
    kubectl -n argocd get secret argocd-initial-admin-secret -o json |  jq '.data|to_entries|map({key, value:.value|@base64d})|from_entries'
  • Create application
    • Click [Create Application], fill in the below

      Setting Value
      Application Name podinfo
      Project default
      Sync Policy Automatic
      Sync Policy check both boxes
      Sync Options check "auto-create namespace"
      Repository URL https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/podinfo.git
      Revision HEAD
      Path chart
      Cluster URL https://kubernetes.default.svc
      Namespace podinfo
    • Click [Create] (top of page)

    • Validate app syncs/becomes healthy

      WIP: Creating application with YAML template
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
      name: podinfo
      spec:
      destination:
          name: ''
          namespace: podinfo
          server: 'https://kubernetes.default.svc'
      source:
          path: chart
          repoURL: 'https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/podinfo.git'
          targetRevision: HEAD
      project: default
      syncPolicy:
          automated:
          prune: true
          selfHeal: true
          syncOptions:
          - CreateNamespace=true
  • Delete app

Minio

  • Log into the Minio UI - access and secret key are in the minio-root-creds-secret secret
      kubectl -n minio get secret minio-root-creds-secret -o json | jq -r '.data.accesskey' | base64 -d ; echo ' <- access key'
      kubectl -n minio get secret minio-root-creds-secret -o json | jq -r '.data.secretkey' | base64 -d ; echo ' <- secret key'
  • Create bucket
  • Store file to bucket
  • Download file from bucket
  • Delete bucket and files

Mattermost

  • Login to mattermost with SSO
  • Update/modify profile picture
  • Send chats/validate chats from previous releases are visible.

    💡 Tip The ability to see chats in other teams requires Mattermost administrator rights.

  • Under System Console -> Elastic click Test Connection and Index Now, then validate success

Twistlock

  • Login to twistlock/prisma cloud with the credentials encrypted in bigbang/prod/environment-bb-secret.enc.yaml
    # from <repo>/bigbang/customers/bigbang project root dir
    sops --decrypt environment-bb-secret.enc.yaml | grep -1 twistlock
  • Only complete the sub-steps here if Twistlock was upgraded
    • Navigate to Manage -> Defenders -> Deploy
      • 3: twistlock-console
      • 12: On Toggle on "Monitor Istio"
      • 14: Off Disable official registry
      • 15: registry1.dso.mil/ironbank/twistlock/defender/defender:latest
      • 16: private-registry
      • 17: On Deploy Defenders with SELinux Policy
      • 17: On Nodes use Container Runtime Interface (CRI), not Docker
      • 17: On Nodes runs inside containerized environment
      • 18b: download the yaml files
    • Apply the yaml in the dogfood cluster, validate the pods go to running
  • Under Manage -> Defenders -> Manage, make sure # of defenders online is equal to number of nodes on the cluster
  • Under Radars -> Containers, validate pods are shown across all namespaces

Kyverno

  • Test secret sync in new namespace
    # create secret in kyverno NS
    kubectl create secret generic \
      -n kyverno kyverno-bbtest-secret \
      --from-literal=username='username' \
      --from-literal=password='password'
    
    # Create Kyverno Policy
    kubectl apply -f https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/kyverno/-/raw/main/chart/tests/manifests/sync-secrets.yaml
    
    # Check if secret is create in NEW namespace
    kubectl create ns kyverno-test   # wait for 5s for Policy to be ready
    kubectl label ns kyverno-test kubernetes.io/metadata.name=kyverno-bbtest --overwrite=true
    kubectl get secrets kyverno-bbtest-secret -n kyverno-test  # Test passed if found
  • Delete the test resources
    # If above is successful, delete test resources
    kubectl delete -f https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/kyverno/-/raw/main/chart/tests/manifests/sync-secrets.yaml
    kubectl delete secret kyverno-bbtest-secret -n kyverno
    kubectl delete ns kyverno-test

Velero

  • Backup PVCs using velero_test.yaml
    $ kubectl apply -f ./docs/release/velero_test.yaml
    # wait 30s for velero to be ready then:
    # exec into velero_test container, check log
    $ veleropod=$(kubectl get pod -n velero-test -o json | jq -r '.items[].metadata.name')
    $ kubectl exec $veleropod -n velero-test -- tail /mnt/velero-test/test.log
    • Install the velero CLI on your workstation if you don't already have it (for MacOS, run brew install velero).
    • Then set VERSION to the release you are testing and use the CLI to create a test backup:
    $ VERSION=1-<minor>-0
    $ velero backup create velero-test-backup-${VERSION} -l app=velero-test
    $ velero backup get
    • Wait a bit, re-run velero backup get, when it shows "Completed" delete the app.
    $ kubectl delete -f ./docs/release/velero_test.yaml
    namespace "velero-test" deleted
    persistentvolumeclaim "velero-test" deleted
    deployment.apps "velero-test" deleted
  • Restore the test resources from the backup
    $ velero restore create velero-test-restore-${VERSION} --from-backup velero-test-backup-${VERSION}
    # exec into velero_test container
    $ kubectl exec $veleropod -n velero-test -- cat /mnt/velero-test/test.log
    # Old log entries and new should be in log if backup was done correctly
  • Cleanup test
    $ kubectl delete -f ./velero_test.yaml

Keycloak

  • Login to Keycloak admin console. The credentials are in the keycloak-credentials secret:
    kubectl get secret keycloak-credentials -n keycloak -o json | jq -r '.data.adminuser' | base64 -d ; echo " <- admin user"
    kubectl get secret keycloak-credentials -n keycloak -o json | jq -r '.data.password' | base64 -d ; echo " <- password"

Extra UI checks

4. Create Release

  • Re-run helm docs in case any package tags changed as a result of issues found in testing.
    $ cd bigbang
    $ git pull
    # pull any last minute cherry picks, verify nothing has greatly changed
    $ git checkout release-1.<minor>.x
    $ docker run -v "$(pwd):/helm-docs" -u $(id -u) jnorwood/helm-docs:v1.5.0 -s file -t .gitlab/README.md.gotmpl --dry-run > README.md
  • Create release candidate tag based on release branch, ie. 1.<minor>.0-rc.0.
    $ git tag -a 1.<minor>.0-rc.0 -m "release candidate"
    # list the tags to make sure you made the correct one
    $ git tag -n
    # push
    $ git push --tags
  • Passed tag pipeline.
  • Create release tag based on release branch. ie. 1.<minor>.0.
    $ git tag -a 1.<minor>.0 -m "release 1.<minor>.0"
    # list the tags to make sure you made the correct one
    $ git tag -n
    # push
    $ git push --tags
  • Passed release pipeline.
  • Add release notes to release.
  • Cherry-pick release commit(s) as needed with merge request back to master branch
  • Close Big Bang Milestone in GitLab.
  • Celebrate and announce release
  1. Ryan/Micah/Branden 2 3

Edited by Branden Cobb