UNCLASSIFIED - NO CUI

Duplicate imagePullSecrets prevents ArgoCD from rescheduling after node failure

Problem Description

On kubernetes 1.26.x (confirmed to exist on 1.26.3 and 1.26.4), in the event of a node failure, ArgoCD cannot be rescheduled and remains scheduled on a non-existent node because there are duplicate entries for imagePullSecrets due to the way garbage collection works (likely a bug).

Steps to reproduce:

  1. Start a cluster on 1.26.3 or 1.26.4
  2. Deploy ArgoCD using the default values for the pull secrets
  3. Once fully deployed, terminate node hosting ArgoCD in cloud provider to simulate node failure
  4. Wait and see that all other pods but ArgoCD get rescheduled

Additional Info

imagePullSecrets from master pod spec:

image

imagePullSecrets from replica pod spec:

image

See related issue in gitlab's chart gitlab#186 (closed)