Update Velero Test
Primary issues with the current test:
- Currently when run in BB CI the velero test works but does not successfully complete due to timing out
- The test does not check to make sure the contents of the nginx log pre-backup match post-restore
- Images used by test need to be updated:
-
minio in chart/templates/tests/backup-restore.yaml
: RELEASE.2021-09-18T18-09-59Z -
kubectl in chart/templates/tests/backup-restore.yaml
v1.21.5 -
nginx in chart/templates/tests/_config-yamls.yaml
: 1.21.1 -
velero in tests/test-values.yml
: v1.6.3 -
velero-plugin-for-aws in tests/test-valus.yml
: v1.2.1 -
miniooperator in tests/dependencies.yaml
-
minio in tests/dependencies.yaml
-
For 1
https://repo1.dso.mil/platform-one/big-bang/apps/cluster-utilities/velero/-/blob/main/chart/tests/scripts/backup-restore.sh#L26-27 - here and several other places we are doing a very lazy "all -A" which when run in BB CI is slow since its grabbing all pods across all namespaces. we should be able to shrink this to just velero's ns
https://repo1.dso.mil/platform-one/big-bang/apps/cluster-utilities/velero/-/blob/main/chart/tests/scripts/backup-restore.sh#L47 - some of the logs we dump seem incredibly useless (at least on success), maybe there's some of these that we can trim back or remove when the test is successful
I0913 21:17:21.409791 506 request.go:655] Throttling request took 1.182350599s, request: GET:https://10.43.0.1:443/apis/constraints.gatekeeper.sh/v1beta1?timeout=32s
I0913 21:17:31.409797 506 request.go:655] Throttling request took 11.18226862s, request: GET:https://10.43.0.1:443/apis/status.gatekeeper.sh/v1beta1?timeout=32s
I0913 21:17:41.609804 506 request.go:655] Throttling request took 9.397593996s, request: GET:https://10.43.0.1:443/apis/telemetry.istio.io/v1alpha1?timeout=32s
I0913 21:17:51.609806 506 request.go:655] Throttling request took 3.598313698s, request: GET:https://10.43.0.1:443/apis/certificates.k8s.io/v1?timeout=32s
I0913 21:18:01.609818 506 request.go:655] Throttling request took 13.598208551s, request: GET:https://10.43.0.1:443/apis/k3s.cattle.io/v1?timeout=32s
This seems to happen a lot in the tests. We should see if there's anything wrong with the way these commands are being run or if that's just expected...it seems like something specific to the Velero test since we don't see this with our other tests.
For 2
This one may be "as good as we can make it" for right now. Velero doesn't seem to behave too nicely with k3d's built in storage (local path vs ebs). This results in the PVCs not really getting restored properly because Velero doesn't know how to back them up. If this is not something we can solve in the short term with different Velero config options then we can leave the test as is. If we're able to solve the "PVC backup on k3d" issue then add an additional check to make sure that contents of the PVC pre-backup are there post-restore.