UNCLASSIFIED - NO CUI

Skip to content

Cannot use S3 bucket as model repository

Summary

Whether running as a container in podman or running as a pod in kubernetes, the triton container fails to startup when the model-repository is an s3 bucket.

This issue is not present in the Nvidia Commercial Triton image.

Steps to reproduce

  1. Create a GPU node with Nvidia drivers installed for the corresponding Cuda version.
  2. (Optional, required if running Podman rather than Docker) Run the following command to generate CDI specs: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
  3. Run the following podman to start a triton container with access to all GPUs and using a model repository stored in AWS: sudo podman run --device nvidia.com/gpu=all registry1.dso.mil/ironbank/opensource/triton-inference-server/server:24.03.01 tritonserver --model-repository=s3://gpu-test/model-repository --model-control-mode=EXPLICIT --strict-model-config=false --log-verbose 1

What is the current bug behavior?

Output looks like the following, appearing to fail on a AWS SDK CPP library for openssl.

image

What is the expected correct behavior?

Output would look like the following moving past loading the model repository. The following is from running the native Nvidia Triton container on which the image is based.

image

Relevant logs and/or screenshots

See above

Possible fixes

This could be due to a missing library or other dependency in the image.

I cannot see the build process for the actual tritonserver code, as it is nowhere in the git repo, and only the resulting artifact is available from the tarball, but there are specific arguments to the build.py that are required to build the aws-sdk-cpp file for s3 model repositories. Specifically, it requires the --filesystem=s3 flag when building the build.py.

It's possible the issue is with openssl. If the proper openssl development libraries aren't installed at build time, or the openssl runtime libraries aren't available in the image, then it would fail to execute the S3 APIs.

Tasks

  • Bug has been identified and corrected within the container

Please read the Iron Bank Documentation for more info

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information