Code Monkey home page Code Monkey logo

nvidia-driver-container's Introduction

NVIDIA GPU Operator driver container for Rocky Linux (and maybe AlmaLinux and Oracle Linux) 8

NVIDIA currently does not support Rocky Linux (or AlmaLinux, or other modern Enterprise Linux clones) in the GPU operator for Kubernetes, making it a challenge to get the NVIDIA operator stood up on a cluster that is running on one of these EL clones. This container image aims to helm with that (though this is only tested with Rocky).

A few changes are necessary to make the RHEL 8 container image build on non-RHEL systems:

  • The OS-sniffing logic here needs to be updated to include identifiers for rocky, almalinux, and ol
  • The target kernel and supporting dependencies necessary to build the NVIDIA kernel modules must be installed into the container and the bootstrapping logic for dependencies here needs to be modified
  • The container image must be installed in a container registry with a tag specific to the target OS + release version for the cluster

Prerequisites

  • A container registry you are already authenticated with that you can publish to
  • A host running the same OS and kernel version as your GPU Kubernetes hosts, to run this build on

Building

To build on Rocky 8, you will just need to override the CONTAINER_REGISTRY env var to point to the registry of your choice.

On AlmaLinux or Oracle Linux, you will also need to update the RPM_BASE_URL env var to point to the BaseOS RPM repo for your OS + architecture.

If you wish to build a different driver version than 535.104.12, override the NVIDIA_DRIVER_VERSION env var as well.

Running ./build.sh after exporting any env vars should build and publish the container to $CONTAINER_REGISTRY/nvidia/driver:$NVIDIA_DRIVER_VERSION-${OS_NAME}${OS_RELEASE}

Deploying

Deploy the operator helm chart with the values for CONTAINER_REGISTRY and NVIDIA_DRIVER_VERSION - as an example:

export CONTAINER_REGISTRY=container-registry.siomporas.com
export NVIDIA_DRIVER_VERSION=535.104.12
helm install --generate-name \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator \
     --set driver.repository=$CONTAINER_REGISTRY/nvidia \
     --set driver.version=$NVIDIA_DRIVER_VERSION

Inspired by this (which no longer works).

nvidia-driver-container's People

Contributors

hotspoons avatar

Stargazers

Michal Muransky avatar Anton Evdokimov avatar Ali Ghanbarzadeh avatar

Watchers

 avatar

Forkers

pulsepointinc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.