Code Monkey home page Code Monkey logo

nvidia-prometheus-stats's Introduction

nvidia-prometheus-stats

Scrapes Memory and GPU utilization metrics using NVML and exposes them to Prometheus through a simple HTTP server and/or a push gateway.

Binaries are build with pyinstaller.

nvidia-prometheus-stats's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

nvidia-prometheus-stats's Issues

Unable to connect NVIDIA stats container metrics to Prometheus

I am running 2 docker containers - Prometheus which is listening at port 9090 while other is a docker container out of the nvidia-prometheus-stats.

$nvidia-docker run --name prometheus -d -p `hostname -i`:9090:9090
 quay.io/prometheus/prometheus
f0ef97418f92f54c645678a7837270657f4b6577ce2dde3bf9cac6335b542665

Dockerfile.CentOS

FROM centos:7
RUN yum update -y && \
    yum install -y epel-release && \
    yum install -y wget git python python-pip python3-dev python-dev
RUN git clone https://github.com/ajeetraina/nvidia-prometheus-stats
WORKDIR nvidia-prometheus-stats
RUN pip install nvidia-ml-py pyinstaller prometheus_client
RUN wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-rhel7-8-0-local-ga2-8.0.61-1
.x86_64-rpm
RUN rpm -ivh cuda-repo-rhel7-8-0-local-ga2-8.0.61-1.x86_64-rpm
RUN yum clean all
RUN yum install -y cuda
RUN pyinstaller nvidia-prometheus-stats.py

While I am trying to connect stats container to Prometheus, I am getting this error:

$nvidia-docker run --rm -p 8080:8080 
ajeetraina/prometheus-nvidia /nvidia-prometheus-stats/dist/nvidia-prometheus-stats
/nvidia-prometheus-stats -g <IP of prometheus container>:9090 -p 8080
ERROR:nvidia-tool:Exception thrown - Unknown Error
Traceback (most recent call last):
  File "nvidia-prometheus-stats.py", line 86, in main
    nvmlInit()
  File "pynvml.py", line 754, in nvmlInit
  File "pynvml.py", line 405, in _nvmlCheckReturn
NVMLError_Unknown: Unknown Error

Any idea?
I have added detailed explanation under https://github.com/ajeetraina/nvidia-prometheus-stats

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.