Hi, I am new to torch and I tried to train VGG model using the model file of <code

you can try to substitute <a href="https://github.com/soumith/imagenet-multiGPU.torch/

HI <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

memory usage of gpu0 is doubled when use Multi-GPU training,about soumith/imagenet-multigpu.torch

Comments (10)

szagoruyko commented on August 11, 2024

you can try to substitute https://github.com/soumith/imagenet-multiGPU.torch/blob/master/models/vgg_cudnn.lua#L39 and https://github.com/soumith/imagenet-multiGPU.torch/blob/master/models/vgg_cudnn.lua#L42 with in-place relu nn.ReLU(true). As those are large matrices it might help. And update to the latest cudnn.torch bindings.

from imagenet-multigpu.torch.

ffmpbgrnn commented on August 11, 2024

Hey, thank you for you quick reply! I tried to replace those two lines and update cudnn. But this time I get the following result

+------------------------------------------------------+
| NVIDIA-SMI 340.65     Driver Version: 340.65         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 0000:02:00.0     Off |                    0 |
| N/A   47C    P0   149W / 225W |   4438MiB /  4799MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 0000:03:00.0     Off |                    0 |
| N/A   47C    P0   152W / 225W |   2742MiB /  4799MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 0000:83:00.0     Off |                    0 |
| N/A   50C    P0   151W / 225W |   2742MiB /  4799MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K20m          Off  | 0000:84:00.0     Off |                    0 |
| N/A   46C    P0   147W / 225W |   2742MiB /  4799MiB |     86%      Default |
+-------------------------------+----------------------+----------------------+

And I met the output of memory problem at the end of the first epoch, with the following log:

Epoch: [1][9995/10000]  Time 1.319 Err 6.9073 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9996/10000]  Time 1.327 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9997/10000]  Time 1.328 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9998/10000]  Time 1.322 Err 6.9062 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9999/10000]  Time 1.329 Err 6.9077 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][10000/10000] Time 1.328 Err 6.9071 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][TRAINING SUMMARY] Total Time(s): 13325.76    average loss (per batch): 6.91   accuracy(%):    top-1 0.10


==> doing epoch on validation data:
==> online epoch # 1
/home/archy/torch/bin/luajit: ...e/archy/torch/share/lua/5.1/threads/threads.lua:255:
[thread 19 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...n/archy/torch/share/lua/5.1/cudnn/SpatialConvolution.lua:97: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error
 (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCGeneral.c:241
[thread 15 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/lua
rocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 17 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 4 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 16 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 8 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 18 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241

Thank you very much!

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

Did you read the error at all? What does it say?

On Sunday, June 21, 2015, Linchao Zhu [email protected] wrote:

Hey, thank you for you quick reply! I tried to replace those two lines and
update cudnn. But this time I get the following result

+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 47C P0 149W / 225W | 4438MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 47C P0 152W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 50C P0 151W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 86% Default |
+-------------------------------+----------------------+----------------------+

And I met the output of memory problem at the end of the first epoch, with
the following log:

Epoch: [1][9995/10000] Time 1.319 Err 6.9073 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9996/10000] Time 1.327 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9997/10000] Time 1.328 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9998/10000] Time 1.322 Err 6.9062 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9999/10000] Time 1.329 Err 6.9077 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][10000/10000] Time 1.328 Err 6.9071 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][TRAINING SUMMARY] Total Time(s): 13325.76 average loss (per batch): 6.91 accuracy(%): top-1 0.10

==> doing epoch on validation data:
==> online epoch # 1
/home/archy/torch/bin/luajit: ...e/archy/torch/share/lua/5.1/threads/threads.lua:255:
[thread 19 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...n/archy/torch/share/lua/5.1/cudnn/SpatialConvolution.lua:97: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error
(2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCGeneral.c:241
[thread 15 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/lua
rocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 17 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 4 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 16 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 8 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 18 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241

Thank you very much!

—
Reply to this email directly or view it on GitHub
#6 (comment)
.

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

Vggd is probably not going to fit in k20

On Sunday, June 21, 2015, soumith [email protected] wrote:

Did you read the error at all? What does it say?

On Sunday, June 21, 2015, Linchao Zhu <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

Hey, thank you for you quick reply! I tried to replace those two lines
and update cudnn. But this time I get the following result

+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 47C P0 149W / 225W | 4438MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 47C P0 152W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 50C P0 151W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 86% Default |
+-------------------------------+----------------------+----------------------+

And I met the output of memory problem at the end of the first epoch,
with the following log:

Epoch: [1][9995/10000] Time 1.319 Err 6.9073 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9996/10000] Time 1.327 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9997/10000] Time 1.328 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9998/10000] Time 1.322 Err 6.9062 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9999/10000] Time 1.329 Err 6.9077 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][10000/10000] Time 1.328 Err 6.9071 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][TRAINING SUMMARY] Total Time(s): 13325.76 average loss (per batch): 6.91 accuracy(%): top-1 0.10

==> doing epoch on validation data:
==> online epoch # 1
/home/archy/torch/bin/luajit: ...e/archy/torch/share/lua/5.1/threads/threads.lua:255:
[thread 19 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...n/archy/torch/share/lua/5.1/cudnn/SpatialConvolution.lua:97: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error
(2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCGeneral.c:241
[thread 15 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/lua
rocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 17 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 4 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 16 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 8 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 18 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241

Thank you very much!

—
Reply to this email directly or view it on GitHub
#6 (comment)
.

from imagenet-multigpu.torch.

ffmpbgrnn commented on August 11, 2024

Hi, sorry for the confusion. Let me explain more clearly.
I trained with VGG-A with ReLU and latest cudnn.torch binding. And here is the nvidia-smi result:

Sun Jun 21 22:42:52 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65     Driver Version: 340.65         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 0000:02:00.0     Off |                    0 |
| N/A   46C    P0   153W / 225W |   4438MiB /  4799MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 0000:03:00.0     Off |                    0 |
| N/A   46C    P0   147W / 225W |   2742MiB /  4799MiB |     51%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 0000:83:00.0     Off |                    0 |
| N/A   49C    P0   155W / 225W |   2742MiB /  4799MiB |     54%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K20m          Off  | 0000:84:00.0     Off |                    0 |
| N/A   45C    P0   146W / 225W |   2742MiB /  4799MiB |     56%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0     13440  /home/archy/torch/bin/luajit                        4422MiB |
|    1     13440  /home/archy/torch/bin/luajit                        2726MiB |
|    2     13440  /home/archy/torch/bin/luajit                        2726MiB |
|    3     13440  /home/archy/torch/bin/luajit                        2726MiB |
+-----------------------------------------------------------------------------+

As you can see, the memory usage of GPU0 is much higher than the others GPUs (4438MiB vs 2742MiB). And I tried to train Alexnet with 2 GPUs with the following log:

Sun Jun 21 22:54:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65     Driver Version: 340.65         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 0000:02:00.0     Off |                    0 |
| N/A   41C    P0   121W / 225W |   1800MiB /  4799MiB |     91%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 0000:03:00.0     Off |                    0 |
| N/A   41C    P0   127W / 225W |    932MiB /  4799MiB |     89%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 0000:83:00.0     Off |                    0 |
| N/A   40C    P0    48W / 225W |    394MiB /  4799MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K20m          Off  | 0000:84:00.0     Off |                    0 |
| N/A   36C    P0    46W / 225W |    394MiB /  4799MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0      3345  /home/archy/torch/bin/luajit                        1784MiB |
|    1      3345  /home/archy/torch/bin/luajit                         916MiB |
|    2      3345  /home/archy/torch/bin/luajit                         378MiB |
|    3      3345  /home/archy/torch/bin/luajit                         378MiB |
+-----------------------------------------------------------------------------+

GPU0(1784MiB) costs doubled memory to GPU1(916MiB).
I thought GPU0 might be the reason for the problem of run out of memory.
Thank for very much!

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

Hi, could you try to reduce the batch size, using the commandline option:
-batchSize 32

or even reduce it to:

-batchSize 16

See if that helps.

On Sun, Jun 21, 2015 at 11:01 PM, Linchao Zhu [email protected]
wrote:

Hi, sorry for the confusion. Let me explain more clearly.
I trained with VGG-A with ReLU and latest cudnn.torch binding. And here is
the nvidia-smi result:

Sun Jun 21 22:42:52 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 46C P0 153W / 225W | 4438MiB / 4799MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 49C P0 155W / 225W | 2742MiB / 4799MiB | 54% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 45C P0 146W / 225W | 2742MiB / 4799MiB | 56% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 13440 /home/archy/torch/bin/luajit 4422MiB |
| 1 13440 /home/archy/torch/bin/luajit 2726MiB |
| 2 13440 /home/archy/torch/bin/luajit 2726MiB |
| 3 13440 /home/archy/torch/bin/luajit 2726MiB |
+-----------------------------------------------------------------------------+

As you can see, the memory usage of GPU0 is much higher than the
others GPUs (4438MiB vs 2742MiB). And I tried to train Alexnet with 2
GPUs with the following log:

Sun Jun 21 22:54:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 41C P0 121W / 225W | 1800MiB / 4799MiB | 91% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 41C P0 127W / 225W | 932MiB / 4799MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 40C P0 48W / 225W | 394MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 36C P0 46W / 225W | 394MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 3345 /home/archy/torch/bin/luajit 1784MiB |
| 1 3345 /home/archy/torch/bin/luajit 916MiB |
| 2 3345 /home/archy/torch/bin/luajit 378MiB |
| 3 3345 /home/archy/torch/bin/luajit 378MiB |
+-----------------------------------------------------------------------------+

GPU0(1784MiB) costs doubled memory to GPU1(916MiB).
I thought GPU0 might be the reason for the problem of run out of memory.
Thank for very much!

—
Reply to this email directly or view it on GitHub
#6 (comment)
.

from imagenet-multigpu.torch.

zhongwen commented on August 11, 2024

HI @soumith, I am together with @ffmpbgrnn. I think @ffmpbgrnn 's question is simply that

why does GPU 0 utilize double GPU memory as GPU 1 (in two GPUs condition)?

We may just ignore the out-of-memory problem firstly. The batch-size currently can fit the GPUs well and the point is about the difference between GPU 0 and other GPUs.

Another weird thing is that when set -nGPU 2, GPU 2 and GPU 3 have memory occupation of about 400 MB.

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

hey @zhongwen , this is because when we synchronize the weights across GPUs, we provide dedicated buffers on GPU-1 for all weights to accumulate into (so that the weight transfer from GPU{2,3,4} over to GPU1 are done in parallel and non-blocking). That's why you see GPU-1 having more memory usage.

from imagenet-multigpu.torch.

zhongwen commented on August 11, 2024

To my understanding, the total size of the model is about 500MB, So why does the buffer occupy a lot more than this?

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

hey @zhongwen , You also have to take into account on GPU0, the Linear layers. They are only running on GPU0 and not running on GPU1. The DataParallel is only on the Convolution layers.

Does that make sense?

Essentially, you have about ~400-500MB overhead coming from the multi-GPU buffers (probably, I would say lesser than that), and you have the overhead from the Fully-connected layers and SoftMax

from imagenet-multigpu.torch.

memory usage of gpu0 is doubled when use Multi-GPU training about imagenet-multigpu.torch HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent