Duplicate imagePullSecrets prevents ArgoCD from rescheduling after node failure
Problem Description
On kubernetes 1.26.x (confirmed to exist on 1.26.3 and 1.26.4), in the event of a node failure, ArgoCD cannot be rescheduled and remains scheduled on a non-existent node because there are duplicate entries for imagePullSecrets
due to the way garbage collection works (likely a bug).
Steps to reproduce:
- Start a cluster on 1.26.3 or 1.26.4
- Deploy ArgoCD using the default values for the pull secrets
- Once fully deployed, terminate node hosting ArgoCD in cloud provider to simulate node failure
- Wait and see that all other pods but ArgoCD get rescheduled
Additional Info
imagePullSecrets
from master pod spec:
imagePullSecrets
from replica pod spec:
See related issue in gitlab's chart gitlab#186 (closed)