Draft
Conversation
Bind-mount /etc/nvidia-container-runtime/host-files-for-container.d (read-only) into the gateway container when it exists, so the nvidia runtime running inside k3s can apply the same host-file injection config as on the host — required for Jetson/Tegra platforms. Signed-off-by: Evan Lezar <elezar@nvidia.com>
Use ghcr.io/nvidia/k8s-device-plugin:2ab68c16 which includes support for mounting /etc/nvidia-container-runtime/host-files-for-container.d into the device plugin pod, required for correct CDI spec generation on Tegra-based systems. Also included is an nvcdi API bump that ensures that additional GIDs are included in the generated CDI spec. Signed-off-by: Evan Lezar <elezar@nvidia.com>
initgroups(3) replaces all supplemental groups with the user's entries from /etc/group, discarding GIDs injected by the container runtime via CDI (e.g. GID 44/video needed for /dev/nvmap on Tegra). Snapshot the container-level GIDs before initgroups runs and merge them back afterwards, excluding GID 0 (root) to avoid privilege retention. Signed-off-by: Evan Lezar <elezar@nvidia.com>
On Jetson/Tegra platforms nvidia-smi is installed at /usr/sbin/nvidia-smi rather than /usr/bin/nvidia-smi and may not be on PATH inside the sandbox. Fall back to the full path when the bare command is not found. Signed-off-by: Evan Lezar <elezar@nvidia.com>
Member
Author
|
cc @johnnynunez |
|
LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds GPU support for NVIDIA Tegra/Jetson platforms by bind-mounting the
host-files configuration directory, updating the device plugin image, and
preserving CDI-injected GIDs across privilege drop.
Related Issue
Part of #398 (CDI injection). Depends on #568 (Tegra system support). Should be merged after #495 and #503.
Upstream PRs:
Changes
/etc/nvidia-container-runtime/host-files-for-container.d(read-only) into the gateway container when present, so the nvidia runtime inside k3s applies the same host-file injection config as the host — required for Jetson/Tegra CDI spec generationadditionalGidsin the CDI spec (GID 44 /video, required for/dev/nvmapaccess on Tegra)initgroups()during privilege drop, so exec'd processes retain access to GPU devices/usr/sbin/nvidia-smiin the GPU e2e test for Tegra systems wherenvidia-smiis not on the defaultPATHTesting
mise run pre-commitpassesChecklist