Triton pod doesn't connect to GPU properly.
Summary
When starting up the Triton pod on a GPU node, it attempts to start up the Cuda runtime and errors out. Specifically, the error it gives is:
Given that the Cuda and graphics drivers installed on the host machine are the latest available, and Nvidia drivers are backwards compatible, there seems to be something wrong with the driver libraries installed on the image.
Investigation shows that the libcuda.so library is much smaller than is expected. Roughly 162KB
Comparing to current 12.3 and 12.4 Cuda libraries, this is expected to be around 29 MB
Steps to reproduce
Reproducing this requires utilizing a GPU node, either AWS or bare metal, and installing the proper graphics, Cuda, and container toolkit drivers. Then start up a triton pod. Testing this can be done with or without a true model. The triton pod doesn't even get past startup, so it doesn't even really need an existing model, just the --model-repository
argument.
What is the current bug behavior?
Pod logs show the application starts up, attempts to start Cuda drivers, and fails due to the Cuda driver version being older than the Cuda runtime version of 12.3.
What is the expected correct behavior?
Pulling the corresponding triton-inference-server image directly from Nvidia has the following results. This utilizes the following image: nvcr.io/nvidia/tritonserver:23.12-py3
Comparing the Nvidia native image to the Ironbank built one, there are some obvious differences, but a lot is not in the same place on both images.
Relevant logs and/or screenshots
Possible fixes
I'm not certain of the process of building the tarballs that are downloaded and unzipped to the image, but they need to be reviewed to ensure the correct Cuda libraries are installed and that all libraries are being installed. It appears some like the libnvidia-nvvm.so
library are not not being installed.
Tasks
-
Bug has been identified and corrected within the container
Please read the Iron Bank Documentation for more info