Triton pod doesn't connect to GPU properly.

Summary

When starting up the Triton pod on a GPU node, it attempts to start up the Cuda runtime and errors out. Specifically, the error it gives is:

Given that the Cuda and graphics drivers installed on the host machine are the latest available, and Nvidia drivers are backwards compatible, there seems to be something wrong with the driver libraries installed on the image.

Investigation shows that the libcuda.so library is much smaller than is expected. Roughly 162KB

Comparing to current 12.3 and 12.4 Cuda libraries, this is expected to be around 29 MB

Steps to reproduce

Reproducing this requires utilizing a GPU node, either AWS or bare metal, and installing the proper graphics, Cuda, and container toolkit drivers. Then start up a triton pod. Testing this can be done with or without a true model. The triton pod doesn't even get past startup, so it doesn't even really need an existing model, just the --model-repository argument.

What is the current bug behavior?

Pod logs show the application starts up, attempts to start Cuda drivers, and fails due to the Cuda driver version being older than the Cuda runtime version of 12.3.

What is the expected correct behavior?

Pulling the corresponding triton-inference-server image directly from Nvidia has the following results. This utilizes the following image: nvcr.io/nvidia/tritonserver:23.12-py3

Comparing the Nvidia native image to the Ironbank built one, there are some obvious differences, but a lot is not in the same place on both images.

Relevant logs and/or screenshots

From the Nvidia native image:

Possible fixes

I'm not certain of the process of building the tarballs that are downloaded and unzipped to the image, but they need to be reviewed to ensure the correct Cuda libraries are installed and that all libraries are being installed. It appears some like the libnvidia-nvvm.so library are not not being installed.

Tasks

Bug has been identified and corrected within the container

Please read the Iron Bank Documentation for more info

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

Admin message