This is a project for detecting the Graphic Memory Usage of HPC cluster. Please log into the VPN of MBZUAI and connect to index for available gpus and user for detailed usage of each user.
The examples are shown as followed.
- First you need to clone the repo
git clone [email protected]:sanshuiii/Huaibi-Dector.git
cd Huaibi-Dector
- Prepare the environment (with conda which is recommended)
conda create -n huaibi python=3.8
pip install tqdm
pip install gpustat
pip install flask
Note: Please make sure that gpustat is available on all the machines. (It will be easier if there exists a shared disk for the environment.)
- Remember to rename the config_sample.py
mv config_sample.py config.py
-
Update config as instructed in config.py
-
Run data collector (catch_huaibi.py) and webserver (web_server.py). Recommended in a tmux session.
-
See the result on
localhost:port
This code is designed for NVIDIA® Quadro RTX™ 6000 clusters only. Nodes with other GPUs there might be errors. For more detailed personalization please refer to catch_huaibi.py
and modify the code before gpu.update()
.