Calico Container does not boot
Summary
While attempting to use the Ironbank RKE2 1.26.1 images it was found that many services that depend on Calico node would not start. After some debugging it was found that the Calico node container complained about a missing /etc/rc.local
file or directory. After looking at the dockerfile for the Calico image it seems like the /etc/rc.local is moved to rc.d but the Calico startup sequence still needs access to the rc.local location.
I replaced the Ironbank sourced Calico image with the vanilla image that SUSE ships and kept the other services in the cluster running off of Ironbank sourced images. Swapping out the Calico image allowed all cluster resources to spin up without issue.
It seems like there is an intended customization that is expected to be made to use the Calico image from IronBank but no details of what this customization is is included in the readme file on the repo.
Steps to reproduce
-
Gather all RKE2 Core images from Ironbank
-
Retag them with rancher/imagename:version
-
Save the images to a file titled rke2-images.linux-amd64.tar.zst
-
Follow the install instructions for RHEL OS using the RPM install option here: https://docs.rke2.io/install/airgap (uses the install.sh script)
-
Make the updates to the system to use cis profile option as expressed in the ironbank image readme
sudo cp -f /usr/share/rke2/rke2-cis-sysctl.conf /etc/sysctl.d/60-rke2-cis.conf
sudo systemctl restart systemd-sysctl
sudo useradd -r -c "etcd user" -s /sbin/nologin -M etcd
-
Add config.yaml file at /etc/rancher/rke2/config.yaml with cis profile contents:
profile: cis
disable: rke2-ingress-nginx
- Start the server
systemctl start rke2-server
Then observe the cluster services like core dns and others endlessly fail waiting on calico node to become healthy.
What is the current bug behavior?
The ironbank calico image produces a failing container that is looking for the /etc/rc.local
location.
What is the expected correct behavior?
You should see a calico container boot without customization needed to be made to the source image and its file structure.
Relevant logs and/or screenshots
Logs from the Calico container is as follows:
/usr/sbin/start_runit: line 5: /etc/rc.local: No such file or directory
Calico node failed to start
Possible fixes
Seems like line 30 from the dockerfile might be related. COPY --from=calico_rootfs_overlay /etc/rc.local /etc/rc.d/rc.local
. The intent might have been for Calico to read rc.local resources from the rc.d location but maybe the start up script within Calico wasn't updated or overridden to have it look in the updated location.
Tasks
-
Bug has been identified and corrected within the container
Please read the Iron Bank Documentation for more info