Code Monkey home page Code Monkey logo

nvidia-terraform-modules's Introduction

NVIDIA Terraform Kubernetes Modules

Infrastructure as code for GPU accelerated managed Kubernetes clusters. These scripts automate the deployment of GPU-Enabled Kubernetes clusters on various cloud service platforms.

Getting Started With Terraform

Terraform is an open-source infrastructure as code software tool that we will use to automate the deployment of Kubernetes clusters with the required add-ons to enable NVIDIA GPUs. This repository contains Terraform modules, which are sets of Terraform configuration files ready for deployment. The modules in this repository can be incorporated into existing Terraform-managed infrastructure, or used to set up new infrastructure from scratch. You can learn more about Terraform here.

You can download Terraform (CLI) here.

Support Matrix

NVIDIA offers support for Kubernetes through NVIDIA AI Enterprise. Refer to the product support matrix for supported managed Kubernetes platforms.

The Kubernetes clusters provisioned by the modules in this repository provide tested and certified versions of Kubernetes, the NVIDIA GPU operator, and the NVIDIA Driver.

If your application does not require a specific version of Kubernetes, we recommend using the latest available version. We also recommend you plan to upgrade your version of Kubernetes at least every 6 months.

Each CSP has its own end of life date for the versions of Kubernetes they support. For more information see:

Version Release Date Kubernetes Versions NVIDIA GPU Operator NVIDIA Data Center Driver* End of Life
0.3.0 September 2023 EKS - 1.26
GKE - 1.26
AKS - 1.26
23.6.1 (Default); 23.3.2 (NV AI E) 535.54.03 (EKS & GKE Default); 525.125.06 (NV AI E version for GKE & EKS) EKS - June 2024
GKE - June 2024
AKS - March 2024
0.2.0 August 2023 EKS - 1.26
GKE - 1.26
AKS - 1.26
23.3.2 535.54.03 (EKS & GKE) EKS - June 2024
GKE - June 2024
AKS - March 2024
0.1.0 June 2023 EKS - 1.26
GKE - 1.26
AKS - 1.26
23.3.2 525.105.17 EKS - June 2024
GKE - June 2024
AKS - March 2024
  • On AKS, the driver comes pre-installed on the host and the version is not known in advance.

Usage

Provision a GPU enabled Kubernetes Cluster

Creating an EKS Cluster

Call the EKS module by adding this to an existing Terraform file:

module "nvidia-eks" {
  source       = "git::github.com/nvidia/nvidia-terraform-modules/eks" 
  cluster_name = "nvidia-eks"
}

See the EKS README for all available configuration options.

Creating an AKS Cluster

Call the AKS module by adding this to an existing Terraform file:

module "nvidia-aks" {
  source                 = "git::github.com/NVIDIA/nvidia-terraform-modules/aks" 
  cluster_name           = "nvidia-aks-cluster"
  admin_group_object_ids = [] # See description of this value in the AKS Readme
  location               = "us-west1"
}

See the AKS README for all available configuration options.

Creating a GKE Cluster

Call the GKE module by adding this to an existing Terraform file:

module "nvidia-gke" {
  source       = "git::github.com/NVIDIA/nvidia-terraform-modules/gke" 
  cluster_name =  "nvidia-gke-cluster"
  project_id   =  "your-gcp-project-id"
  region       =  "us-west1"     
  node_zones   =  ["us-west1-a"]
}

See the GKE README for all available configuration options.

Cloud Native Service Add On Pack (CNPack)

In each subdirectory, there is a Terraform module to provision the Kubernetes cluster and any additional prerequisite cloud infrastructure to launch CNPack. See CNPack on EKS, CNPack on GKE, and CNPack on AKS for more information and the sample CNPack configuration file.

More information on CNPack can be found on the NVIDIA AI Enterprise Documentation

State Management

These modules do not set up state management for the generated Terraform state file, deleting the statefile (terraform.tfstate) generated by Terraform could result in cloud resources needing to be manually deleted. We strongly encourage you configure remote state. Please see the Terraform Documentation for more information.

Contributing

Pull requests are welcome! Please see our contribution guidelines.

Getting help or Providing feedback

Please open an issue on the GitHub project for any questions. Your feedback is appreciated.

Useful Links

nvidia-terraform-modules's People

Contributors

angudadevops avatar cailani-nv avatar evberrypi avatar maggiexjzhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.