Cannot use S3 bucket as model repository
Summary
Whether running as a container in podman or running as a pod in kubernetes, the triton container fails to startup when the model-repository is an s3 bucket.
This issue is not present in the Nvidia Commercial Triton image.
Steps to reproduce
- Create a GPU node with Nvidia drivers installed for the corresponding Cuda version.
- (Optional, required if running Podman rather than Docker) Run the following command to generate CDI specs:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
- Run the following podman to start a triton container with access to all GPUs and using a model repository stored in AWS:
sudo podman run --device nvidia.com/gpu=all registry1.dso.mil/ironbank/opensource/triton-inference-server/server:24.03.01 tritonserver --model-repository=s3://gpu-test/model-repository --model-control-mode=EXPLICIT --strict-model-config=false --log-verbose 1
What is the current bug behavior?
Output looks like the following, appearing to fail on a AWS SDK CPP library for openssl.
What is the expected correct behavior?
Output would look like the following moving past loading the model repository. The following is from running the native Nvidia Triton container on which the image is based.
Relevant logs and/or screenshots
See above
Possible fixes
This could be due to a missing library or other dependency in the image.
I cannot see the build process for the actual tritonserver code, as it is nowhere in the git repo, and only the resulting artifact is available from the tarball, but there are specific arguments to the build.py
that are required to build the aws-sdk-cpp file for s3 model repositories. Specifically, it requires the --filesystem=s3
flag when building the build.py.
It's possible the issue is with openssl. If the proper openssl development libraries aren't installed at build time, or the openssl runtime libraries aren't available in the image, then it would fail to execute the S3 APIs.
Tasks
-
Bug has been identified and corrected within the container
Please read the Iron Bank Documentation for more info