Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.
Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.
Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes, so users can easily configure and use GPU resources for accelerating deep learning workloads.
Get started with Kubernetes on NVIDIA GPUs by reviewing the installation guide.
The general Kubernetes documentation is available at kubernetes.io.
General troubleshooting guidelines are available in the documentation. Feel free to also open an issue on GitHub or post questions on the NVIDIA Developer Forums.
For general Kubernetes issues, start with the troubleshooting guide.
This release of Kubernetes is supported on the following platforms.
- DGX-1 with OS Server v3.1.6
- DGX-Station with OS Desktop v3.1.6
NVIDIA GPU Cloud virtual machine images available on Amazon EC2 and Google Cloud Platform.
- Support for NVIDIA GPUs in Kubernetes using the NVIDIA device plugin
- Support for GPU attributes such as GPU type and memory requirements via the Kubernetes PodSpec
- Visualize and monitor GPU metrics and health with an integrated GPU monitoring stack of NVIDIA DCGM, Prometheus and Grafana
- Support for Docker and CRI-O using the NVIDIA Container Runtime