anderskm / gputil Goto Github PK
View Code? Open in Web Editor NEWA Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python
License: MIT License
A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python
License: MIT License
A common error is that nvidia-smi
outputs an error instead of the expected data.
Example:
# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.161
# echo $?
18
This needs to be handled in the code.
(Pull request coming up)
Hi I notice when using GPUtil that the CPU usage is much higher than pynvml, can anyone explain why or assist me?
Using GPUtil
#!/usr/bin/python
import GPUtil
gpu = GPUtil.getGPUs()[0]
gpu_util = int(gpu.load * 100)
gpu_temp = int(gpu.temperature)
$ /usr/bin/time -v ./GPUtil-test.py
Command being timed: "./GPUtil-test.py"
User time (seconds): 0.21
System time (seconds): 0.43
Percent of CPU this job got: 481%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.13
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 26088
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 7978
Voluntary context switches: 32
Involuntary context switches: 769
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Using pynvml
#!/usr/bin/python
import pynvml as nv
nv.nvmlInit()
handle = nv.nvmlDeviceGetHandleByIndex(0)
gpu_util = nv.nvmlDeviceGetUtilizationRates(handle).gpu
gpu_temp = nv.nvmlDeviceGetTemperature(handle, nv.NVML_TEMPERATURE_GPU)
nv.nvmlShutdown()
$ /usr/bin/time -v ./pynvml-test.py
Command being timed: "./pynvml-test.py "
User time (seconds): 0.02
System time (seconds): 0.01
Percent of CPU this job got: 84%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 15732
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2454
Voluntary context switches: 2
Involuntary context switches: 2
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I often don't care about the memory usage of my GPUs, but I care a lot if someone else is using the GPUs.
Is there any way, like in gpustat command to GPUtil.getFirstAvailable(order='memory', maxLoad=1, maxMemory=1)
and isUsed=False
?
I have processes running on 1000s of GPUs, and I don't want to run on already used GPUs by myself or other people.
Easiest way to replicate would be:
time:
import nvidia_smi
import numpy as np
nvidia_smi.nvmlInit()
for _ in range(50):
gpus = [nvidia_smi.nvmlDeviceGetHandleByIndex(i) for i in range(nvidia_smi.nvmlDeviceGetCount())]
res_arr = [nvidia_smi.nvmlDeviceGetUtilizationRates(handle) for handle in gpus]
print('Usage with nivida-smi: ', np.sum([res.gpu for res in res_arr]), '%')
Then time:
import GPUtil
import numpy as np
for _ in range(50):
res_arr = GPUtil.getGPUs()
print('Usage with GPUtil: ', np.sum([res.load for res in res_arr])*100, '%')
YMMV here but for the first one I get constant reports of 1% GPU utilization and runtime is:
real 0m0,179s
user 0m0,688s
sys 0m0,818s
For the second one GPU utilization climb to a whooping 93% by the 6th call and the runtime is:
real 0m11,267s
user 0m0,605s
sys 0m11,449s
The getGPUs()
seems to be fairly close to what nvidia SMI does with nvmlDeviceGetUtilizationRates
, and quite frankly it being 63x times slower and consuming ~100% of my GPU (2080RTX) to run, as opposed to 1% seems a bit unreasonable.
Since may people use this library to figure out GPU utilization it might be reasonable to try and have a more efficient version of getGPUs
for that or, if it provides some "extra" features (e.g. it samples 100x calls and average them out) a way to control the settings on that might be welcome.
Or maybe I'm doing something completely wrong here, in which case, let me know.
In Line 90:
lines = output.split(os.linesep)
returns ['']
instead of []
when nvidia-smi finds no GPU, which then causes ValueError by the parser.
Suggested update:
lines = list(filter(None, output.split(os.linesep)))
When running GPUtil.getGPUs()
with 0 available GPUs, I get an error on line 102 in GPUtil.py
. Line 92 assumes the number of available devices is returned, but it doesn't account for the fact that you can get the str "No devices were found"
as an output and instead returns the number of devices as 1. This errors out on line 102 as we can't cast the str to an int.
Should be an easy enough fix, would just need a check after line 92 if numDevices == 1
to make sure it's an actual number and not the str.
This is very useful module! Thank you!
I have a small suggestion wrt README.md which is somewhat misleading, as it says:
CUDA GPU with latest CUDA driver installed. GPUtil uses the program nvidia-smi to get the GPU status of all available CUDA GPUs. nvidia-smi should be installed automatically, when you install your CUDA driver.
But according to: https://developer.nvidia.com/nvidia-system-management-interface
NVIDIA-smi ships with NVIDIA GPU display drivers on Linux, and with 64bit Windows Server 2008 R2 and Windows 7.
so I think the correct description needs to replace CUDA to NVIDIA drivers.
The reason I find it's important, is because for example pytorch started shipping its own CUDA libraries and you no longer need to install CUDA-system wide. Yet, the user will still have nvidia-smi if they installed the NVIDIA driver, but not with pytorch. And currently your doc implies that a user must have CUDA installed to have nvidia-smi, which is not so.
I hope my communication was clear.
Thank you.
Address warning:
GPUtil\GPUtil.py:73: DeprecationWarning: Use shutil.which instead of find_executable
nvidia_smi = spawn.find_executable('nvidia-smi')
distutils has been removed from python 3.12.
Now distutils can be imported from setuptools, but this should be made a dependency of the module or this will cause a missed dependency when installing the package.
Reference:
https://peps.python.org/pep-0632/
Hi,
can I get the memory-usage of a process given it's PID?
This is a really handy module. It would be even better if you could access more information available in nvidia-smi -q
. For example:
nvidia-smi.exe -q -i 0
==============NVSMI LOG==============
Timestamp : Wed Aug 12 20:36:37 2020
Driver Version : 442.92
CUDA Version : 10.2
Attached GPUs : 4
GPU 00000000:18:00.0
Product Name : GeForce GTX 1080 Ti
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Serial Number : N/A
GPU UUID : GPU-95ef7c5d-fc11-835b-cd38-2020193cf8e0
Minor Number : N/A
VBIOS Version : 86.02.39.00.22
MultiGPU Board : No
Board ID : 0x1800
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x18
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0610DE
Bus Id : 00000000:18:00.0
Sub System Id : 0x85E51043
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 172000 KB/s
Rx Throughput : 9000 KB/s
Fan Speed : 36 %
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11264 MiB
Used : 458 MiB
Free : 10806 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 229 MiB
Free : 27 MiB
Compute Mode : Default
Utilization
Gpu : 2 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 68 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 65.79 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 1480 MHz
SM : 1480 MHz
Memory : 5005 MHz
Video : 1265 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5505 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1628
Type : C+G
Name : Insufficient Permissions
Used GPU Memory : Not available in WDDM driver model
Process ID : 10220
Type : C+G
Name : C:\Windows\explorer.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 10416
Type : C+G
Name : C:\Program Files\WindowsApps\Microsoft.Windows.Photos_2020.20070.10002.0_x64__8wekyb3d8bbwe\Microsoft.Photos.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 10988
Type : C+G
Name : C:\Windows\SystemApps\Microsoft.Windows.StartMenuExperienceHost_cw5n1h2txyewy\StartMenuExperienceHost.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 11376
Type : C+G
Name : C:\Windows\SystemApps\Microsoft.Windows.Cortana_cw5n1h2txyewy\SearchUI.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 11896
Type : C+G
Name : C:\Program Files\WindowsApps\Microsoft.YourPhone_1.20071.95.0_x64__8wekyb3d8bbwe\YourPhone.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 13212
Type : C+G
Name : C:\Program Files\WindowsApps\Microsoft.SkypeApp_15.63.76.0_x86__kzf8qxf38zg5c\Skype\Skype.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 14404
Type : C+G
Name : Insufficient Permissions
Used GPU Memory : Not available in WDDM driver model
Process ID : 14516
Type : C+G
Name : C:\Windows\SystemApps\Microsoft.MicrosoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 14568
Type : C+G
Name : C:\Windows\ImmersiveControlPanel\SystemSettings.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 15168
Type : C+G
Name : C:\Windows\SystemApps\InputApp_cw5n1h2txyewy\WindowsInternal.ComposableShell.Experiences.TextInput.InputApp.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 15536
Type : C+G
Name : C:\Windows\System32\MicrosoftEdgeCP.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 16048
Type : C+G
Name : C:\Windows\SystemApps\Microsoft.LockApp_cw5n1h2txyewy\LockApp.exe
Used GPU Memory : Not available in WDDM driver model
Process ID : 16564
Type : C+G
Name : C:\Windows\SystemApps\ShellExperienceHost_cw5n1h2txyewy\ShellExperienceHost.exe
Used GPU Memory : Not available in WDDM driver model
Values of '[Not Supported]'
are not handled properly.
In [1]: import GPUtil
In [2]: g = GPUtil.getGPUs()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-871afb3451f3> in <module>()
----> 1 g = GPUtil.getGPUs()
~\AppData\Local\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\GPUtil\__init__.py in getGPUs()
80 deviceIds[g] = int(vals[i])
81 elif (i == 1):
---> 82 gpuUtil[g] = float(vals[i])/100
83 elif (i == 2):
84 memTotal[g] = int(vals[i])
ValueError: could not convert string to float: '[Not Supported]'
I am having an issue with this module.
It doesn't find my GPU, but when I go in Command Line and write "nvidia-smi" everything seems to work.
I already reinstalled my NVIDIA drivers and the module, but nothing works.
After installing via pip, I keep getting this error
Hi there, I am running Automatic1111 and it seems very slow on my new laptop.
I am not usually a coder (at all) but I have been tinkering with this on my older laptop and am now creating on my new one.
the issue is this.
every time that I create an image I get this error:
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified
More specifically I get this error:
Loading VAE weights specified in settings: C:\Users\xxxxxxx\stable-diffusion-webui-directml\models\VAE\klF8Anime2VAE_klF8Anime2VAE.safetensors
Applying attention optimization: InvokeAI... done.
Weights loaded in 3.6s (calculate hash: 2.3s, load weights from disk: 0.6s, apply weights to model: 0.4s, load VAE: 0.2s).
Calculating sha256 for C:\Users\xxxxxxx\stable-diffusion-webui-directml\models\Lora\add_detail.safetensors: 7c6bad76eb54e80ebe40f5a455b1cf7a743e09fe2fc1289cf333544e3aa071ce
0%| | 0/40 [00:00<?, ?it/s]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified
2%|██ | 1/40 [00:28<18:13, 28.03s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<48:17, 32.56s/it]
5%|████▏ | 2/40 [00:51<16:05, 25.40s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<39:59, 27.27s/it]
8%|██████▏ | 3/40 [01:13<14:34, 23.63s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<35:44, 24.64s/it]
10%|████████▎ | 4/40 [01:37<14:17, 23.82s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<35:01, 24.44s/it]
12%|██████████▍ | 5/40 [02:01<13:58, 23.97s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<34:30, 24.36s/it]
15%|████████████▍ | 6/40 [02:25<13:35, 23.98s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<33:55, 24.23s/it]
18%|██████████████▌ | 7/40 [02:48<13:00, 23.66s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<32:58, 23.83s/it]
20%|████████████████▌ | 8/40 [03:12<12:36, 23.65s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<32:28, 23.77s/it]
22%|██████████████████▋ | 9/40 [03:36<12:15, 23.74s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<32:09, 23.82s/it]
25%|████████████████████▌ | 10/40 [03:59<11:53, 23.80s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<31:48, 23.85s/it]
28%|██████████████████████▌ | 11/40 [04:24<11:33, 23.90s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<31:31, 23.94s/it]
30%|████████████████████████▌ | 12/40 [04:48<11:10, 23.96s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<31:11, 23.99s/it]
32%|██████████████████████████▋ | 13/40 [05:12<10:50, 24.08s/it]
[Error GPU temperature protection] nvidia-smi: [WinError 2] The system cannot find the file specified<30:55, 24.10s/it]
35%|████████████████████████████▋ | 14/40 [05:36<10:27, 24.15s/it]
Total progress: 16%|██████████▎ | 14/90 [05:41<30:36, 24.16s/it]
Does anyone know a solution to this?
This is the error it gives me whenever I try to run it:
GPUtil 1.3.0
Traceback (most recent call last):
File "demo_GPUtil.py", line 10, in
GPU.showUtilization()
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\Lib\site-packages\GPUtil\GPUtil.py", line 193, in showUtilization
GPUs = getGPUs()
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\Lib\site-packages\GPUtil\GPUtil.py", line 64, in getGPUs
p = Popen(["nvidia-smi","--query-gpu=index,uuid,utilization.gpu,memory.total,memory.used,memory.free,driver_version,name,gpu_serial,display_active,display_mode", "--format=csv,noheader,nounits"], stdout=PIPE)
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Users\dylan\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Hi,
Is it possible to get the CUDA version installed with gputil?
Hi,
I'd like to thank and commend you on putting this together!
I am running Windows and this is my output of nvidia-smi:
(base) PS C:\Users\sarth> nvidia-smi.exe
Tue Jul 28 16:16:35 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 451.77 Driver Version: 451.77 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... WDDM | 00000000:01:00.0 On | N/A |
| N/A 46C P8 7W / N/A | 4402MiB / 8192MiB | 18% Default |
+-------------------------------+----------------------+----------------------+
But, I am not able to detect the GPU from the GPUtil:
>>> os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
>>> GPUtil.getAvailable()
[]
>>> GPUtil.__version__
'1.4.0'
Is there something extra I need to add in the python code to get this working?
Thanks!
I understand that GPUtil infers the GPUs attributes so it will match the nvidia-smi
output.
The thing is, that GPUtil is commonly used with TensorFlow or other GPU utilizing frameworks - these frameworks usually use the IDs in a manner that is sorted by their quality.
For example, in TensorFlow if you set CUDA_VISIBLE_DEVICES = '0'
in your environment variables, only the fastest GPU will be exposed to the library.
In my setup, I have two different GPUs on the same machine - during runtime I use GPUtil to figure out which GPU has most memory available and using the GPU ID I designate a GPU to use. But since my slowest GPU is installed in the first bus, then it shows up in GPUtil as 0 and the faster one as 1.
I would suggest that there will be a parameter to pass to GPUtil.getGPUs()
that will help sort that out, so that any downstream frameworks that rely on CUDA_VISIBLE_DEVICES
would be able to get the IDs right.
code as below
g0 = GPUtil.getGPUs()[0]
g0.memoryUsed # output 11768.0
but, the nvidia-smi
shows
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:02:00.0 Off | N/A |
| 23% 33C P8 16W / 250W | 171MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:82:00.0 Off | N/A |
| 23% 31C P8 10W / 250W | 10MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
memoryUsed
is not realtime, however, after I redo g0 = GPUtil.getGPUs()[0]
, the output changed.
hi,
i pip installed the package and tried running GPUtil.getAvailable() but got the bellow listed massage. any thought?
thank you very much for this package.
GPUtil.getAvailable()
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\site-packages\GPUtil\GPUtil.py", line 123, in getAvailable
GPUs = getGPUs()
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\site-packages\GPUtil\GPUtil.py", line 64, in getGPUs
p = Popen(["nvidia-smi","--query-gpu=index,uuid,utilization.gpu,memory.total,memory.used,memory.free,driver_version,name,gpu_serial,display_active,display_mode", "--format=csv,noheader,nounits"], stdout=PIPE)
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Users\dkarl\AppData\Local\conda\conda\envs\dudy_test\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
I have pip installed GPUtil for the first time, upon running it I get an error previously described here:
#2
gpuUtil[g] = float(vals[i])/100 causes ValueError: could not convert string to float: '[Not Supported]'
I see from the issue thread that this should be fixed - has it not made its way into the version I get via pip?
Hello,
Packaging gputil with Pyinstaller (console=False [pythonw.exe]) causes a pop-up window to open every time I want to access GPU stats.
I could suppress it by adding creationflags = subprocess.CREATE_NO_WINDOW
in the Popen command.
Line 81 in 42ef071
Maybe it's of interest to future users, so I will leave it here :)
Hi,
On Windows 10 (64 bit), I'm getting the following error:
Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 16:13:55) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.In [1]: import GPUtil
In [2]: GPUtil.showUtilization()
NameError Traceback (most recent call last)
in ()
----> 1 GPUtil.showUtilization()
~\Anaconda3\lib\site-packages\GPUtil\GPUtil.py in showUtilization(all, attrList, useOldCode)
248 elif (isinstance(attr,str)):
249 attrStr = attr;
--> 250 elif (isinstance(attr,unicode)):
251 attrStr = attr.encode('ascii','ignore')
252 else:
NameError: name 'unicode' is not defined
Any idea how to fix this?
Thanks a lot!
Hey,
I've recently been using Kubernetes on Azure through their AKS and have a couple of python packages that use this project as a dependency. In order for Kubes to support a wide range of devices they developed a standard for device interfaces to get information about devices on the machine see https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md and https://github.com/NVIDIA/k8s-device-plugin
Unfortunately this interface replaces nvidia-smi when kubes is being run and as such this project will return 0 GPUs found even when there may be a few attached to a machine.
Would it be possible to add support for finding GPUs through this interface to this project? I'm happy to give it a go and try to add support.
I'm having a strange issue on various machine where every call of showUtilization() shows 0% GPU util, even though nvidia-smi at the same time returns 100%. It does, however, correctly show memory usage. Any idea why this might occur?
Thanks for writing this utility!
I can see clearly there's free GPU memory with both nvidia-smi
and MEM
column in GPUtil.showUtilization()
. But getFirstAvailable
won't return successfully for me.
How do I debug this?
Under "Main functions", README.md
gives the following example:
deviceIDs = GPUtil.getAvailable(order = 'first', limit = 1, maxLoad = 0.5, maxMemory = 0.5, ignoreNan=False, excludeID=[], excludeUUID=[])
I believe ignoreNan
is actually meant to be includeNan
.
It should not throw exception if the nvidia-smi fails. Instead it should return None or something to tell that none found. May be a print would be enough to tell that nvidia-smi is failing.
The showUtilization function offers the possibility to restrict the output given an attrList list in the parameters.
However, if such attrList is defined in the parameter list, it will never make it to its processing. The function decides first between "all" is set or not. In both cases, either the output ("oldCode") is directly printed or the attrList parameter is overwritten, regardless whether it has been set or not.
It's just a small thing, but it would be convenient to be able to restrict the output to only the few fields one needs for debugging ...
Thanks,
Andre
It would be nice if we were able to see the temperature of the GPU as well.
Thank you creating this module. If possible please add __version__
.
>>> GPUtil.__version__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'GPUtil' has no attribute '__version__'
Running a simple looped call to this function (showUtilization) causes stuttering in games (recordable in frametimes) and shown in 3rd party testing below:
(Gif is taken from another project but the below script gives the same issue)
To Reproduce
Steps to reproduce the behavior:
Test Script
import time
import GPUtil
while True:
GPUtil.showUtilization()
time.sleep(1)
As of Python 3.12 distutils has been deprecated and removed so the lib won't work on 3.12+.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.