Add resiliency to auto unseal job
During testing and deployment I noticed vault auto unseal job didn't always initialize correctly, or didn't write the secret everytime. So I added some resiliency to the configMap and job that could benefit others.
ConfigMap edits:
init.sh: |-
KEYS_FOLDER="/vault/data"
METRICS_POLICY_NAME="prometheus-metrics"
METRICS_ROLE_NAME="prometheus"
MONITORING_SERVICE_ACCOUNT_NAME="monitoring-monitoring-kube-prometheus"
MONITORING_NAMESPACE="monitoring"
INIT_OUT=/export/init.out
export VAULT_ADDR=https://vault.{{ domain }}
until curl -L -s -k -f $VAULT_ADDR/v1/sys/seal-status | grep 'initialized' >& /dev/null; do
echo "---=== Waiting For Vault Server ===---";
sleep 5;
done
echo "---=== Initializing Vault ===---"
until vault operator init -address=$VAULT_ADDR > $INIT_OUT; do
echo "retry initialize"
sleep 5;
done
export VAULT_TOKEN=$(grep Token $INIT_OUT | cut -d' ' -f 4)
echo "---=== VAULT_TOKEN written to /export/key ===---"
echo $VAULT_TOKEN > /export/key
MIN_MASTER_KEYS=$(cat $INIT_OUT | grep -e "2:\|3:\|4:" | awk '{print $4}')
KEY_NUMBER=1
for key in $MIN_MASTER_KEYS
do
echo '{"key": "'"$key"'"}' > "$KEYS_FOLDER/master_keys_$KEY_NUMBER.json"
curl --request PUT --data @"$KEYS_FOLDER/master_keys_$KEY_NUMBER.json" "$VAULT_ADDR/v1/sys/unseal"
KEY_NUMBER=$(( $KEY_NUMBER + 1 ))
done
echo "---=== Logging in ===---"
until vault login -no-store $VAULT_TOKEN >& /dev/null; do
echo "Waiting to login to vault";
sleep 5;
done
echo "---=== Login Success ===---"
echo "---=== Enabling Kubernetes ===---"
until vault auth enable kubernetes >& /dev/null; do
echo "retry kubernetes enable";
sleep 5;
done
echo "---=== Configuring Kubernetes ===---"
until vault write auth/kubernetes/config \
kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
issuer="https://kubernetes.default.svc.cluster.local"; do
echo "retry kuberbetes config";
sleep 5;
done
echo "---=== Writing $METRICS_POLICY_NAME Policy ===---"
until vault policy write $METRICS_POLICY_NAME - << EOF
path "/sys/metrics" {
capabilities = ["read"]
}
EOF
do
echo "retry policy write";
sleep 5;
done
echo "---=== Reading $METRICS_POLICY_NAME Policy ===---"
until vault policy read $METRICS_POLICY_NAME; do
echo "retry read";
sleep 5;
done
echo "---=== Writing $METRICS_POLICY_NAME Auth ===---"
until vault write auth/kubernetes/role/$METRICS_ROLE_NAME \
bound_service_account_names=$MONITORING_SERVICE_ACCOUNT_NAME \
bound_service_account_namespaces=$MONITORING_NAMESPACE \
policies=$METRICS_POLICY_NAME ttl=15m; do
echo "retry write auth";
sleep 5;
done
exit 0
basically just adds untils around all the commands to make sure they run. also changed the url to test to the seal status page it's cleaner.
Job edits:
containers:
- name: bigbang-base-secret-creation
command:
- /bin/bash
- -c
- |
echo "---=== Writing secret ===---"
until kubectl create secret generic vault-token --from-file=key=/export/key --from-file=init.out=/export/init.out >& /dev/null; do
echo "Retry writing secret"
sleep 5;
done
echo "Killing Istio Sidecar"
curl -X POST http://localhost:15020/quitquitquit
exit 0
Switches from using a sleep to using the until. This ensures the secret actually gets created and keeps trying until it does.
Here is an example of it running and re-running some steps due to connection issues:
$ kubectl logs -n vault pod/vault-vault-job-init-jht4z -c vault-init-job -f
---=== Waiting For Vault Server ===---
---=== Waiting For Vault Server ===---
---=== Initializing Vault ===---
---=== VAULT_TOKEN written to /export/key ===---
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 95 100 40 100 55 189 260 --:--:-- --:--:-- --:--:-- 450
{"errors":["Vault is not initialized"]}
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to vault.{{ domain }}:443
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 351 100 296 100 55 127 23 0:00:02 0:00:02 --:--:-- 151
{"type":"shamir","initialized":true,"sealed":false,"t":3,"n":5,"progress":0,"nonce":"","version":"1.13.1","build_date":"2023-03-23T12:51:35Z","migration":false,"cluster_name":"vault-cluster-cabb745d","cluster_id":"6e16cc5b-cc7a-74c9-4f39-0f3814366d45","recovery_seal":true,"storage_type":"raft"}
---=== Logging in ===---
---=== Login Success ===---
---=== Enabling Kubernetes ===---
retry kubernetes enable
---=== Configuring Kubernetes ===---
Success! Data written to: auth/kubernetes/config
---=== Writing prometheus-metrics Policy ===---
Success! Uploaded policy: prometheus-metrics
---=== Reading prometheus-metrics Policy ===---
Error reading policy named prometheus-metrics: Error making API request.
URL: GET https://vault.{{ domain }}/v1/sys/policies/acl/prometheus-metrics
Code: 503. Errors:
* Vault is sealed
retry read
path "/sys/metrics" {
capabilities = ["read"]
}
---=== Writing prometheus-metrics Auth ===---
Success! Data written to: auth/kubernetes/role/prometheus
should also add an optional network policy to monitoring if vault is enabled to allow 443 egress to the vault url. otherwise monitoring wont work.