Comments (10)
you can try to substitute https://github.com/soumith/imagenet-multiGPU.torch/blob/master/models/vgg_cudnn.lua#L39 and https://github.com/soumith/imagenet-multiGPU.torch/blob/master/models/vgg_cudnn.lua#L42 with in-place relu nn.ReLU(true). As those are large matrices it might help. And update to the latest cudnn.torch bindings.
from imagenet-multigpu.torch.
Hey, thank you for you quick reply! I tried to replace those two lines and update cudnn. But this time I get the following result
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 47C P0 149W / 225W | 4438MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 47C P0 152W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 50C P0 151W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 86% Default |
+-------------------------------+----------------------+----------------------+
And I met the output of memory problem at the end of the first epoch, with the following log:
Epoch: [1][9995/10000] Time 1.319 Err 6.9073 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9996/10000] Time 1.327 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9997/10000] Time 1.328 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9998/10000] Time 1.322 Err 6.9062 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9999/10000] Time 1.329 Err 6.9077 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][10000/10000] Time 1.328 Err 6.9071 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][TRAINING SUMMARY] Total Time(s): 13325.76 average loss (per batch): 6.91 accuracy(%): top-1 0.10
==> doing epoch on validation data:
==> online epoch # 1
/home/archy/torch/bin/luajit: ...e/archy/torch/share/lua/5.1/threads/threads.lua:255:
[thread 19 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...n/archy/torch/share/lua/5.1/cudnn/SpatialConvolution.lua:97: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error
(2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCGeneral.c:241
[thread 15 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/lua
rocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 17 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 4 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 16 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 8 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 18 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
Thank you very much!
from imagenet-multigpu.torch.
Did you read the error at all? What does it say?
On Sunday, June 21, 2015, Linchao Zhu [email protected] wrote:
Hey, thank you for you quick reply! I tried to replace those two lines and
update cudnn. But this time I get the following result+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 47C P0 149W / 225W | 4438MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 47C P0 152W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 50C P0 151W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 86% Default |
+-------------------------------+----------------------+----------------------+And I met the output of memory problem at the end of the first epoch, with
the following log:Epoch: [1][9995/10000] Time 1.319 Err 6.9073 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9996/10000] Time 1.327 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9997/10000] Time 1.328 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9998/10000] Time 1.322 Err 6.9062 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9999/10000] Time 1.329 Err 6.9077 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][10000/10000] Time 1.328 Err 6.9071 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][TRAINING SUMMARY] Total Time(s): 13325.76 average loss (per batch): 6.91 accuracy(%): top-1 0.10==> doing epoch on validation data:
==> online epoch # 1
/home/archy/torch/bin/luajit: ...e/archy/torch/share/lua/5.1/threads/threads.lua:255:
[thread 19 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...n/archy/torch/share/lua/5.1/cudnn/SpatialConvolution.lua:97: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error
(2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCGeneral.c:241
[thread 15 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/lua
rocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 17 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 4 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 16 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 8 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 18 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241Thank you very much!
—
Reply to this email directly or view it on GitHub
#6 (comment)
.
from imagenet-multigpu.torch.
Vggd is probably not going to fit in k20
On Sunday, June 21, 2015, soumith [email protected] wrote:
Did you read the error at all? What does it say?
On Sunday, June 21, 2015, Linchao Zhu <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:Hey, thank you for you quick reply! I tried to replace those two lines
and update cudnn. But this time I get the following result+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 47C P0 149W / 225W | 4438MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 47C P0 152W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 50C P0 151W / 225W | 2742MiB / 4799MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 86% Default |
+-------------------------------+----------------------+----------------------+And I met the output of memory problem at the end of the first epoch,
with the following log:Epoch: [1][9995/10000] Time 1.319 Err 6.9073 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9996/10000] Time 1.327 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9997/10000] Time 1.328 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9998/10000] Time 1.322 Err 6.9062 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][9999/10000] Time 1.329 Err 6.9077 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][10000/10000] Time 1.328 Err 6.9071 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][TRAINING SUMMARY] Total Time(s): 13325.76 average loss (per batch): 6.91 accuracy(%): top-1 0.10==> doing epoch on validation data:
==> online epoch # 1
/home/archy/torch/bin/luajit: ...e/archy/torch/share/lua/5.1/threads/threads.lua:255:
[thread 19 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...n/archy/torch/share/lua/5.1/cudnn/SpatialConvolution.lua:97: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error
(2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCGeneral.c:241
[thread 15 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/lua
rocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 17 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 4 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 16 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241
[thread 8 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: ...magenet-multiGPU.torch/fbcunn_files/AbstractParallel.lua:97: out of memory at /tmp/luar
ocks_cutorch-scm-1-7449/cutorch/lib/THC/THCTensorCopy.cu:93
[thread 18 endcallback] /home/archy/torch/share/lua/5.1/cutorch/init.lua:21: /tmp/luarocks_cutorch-scm-1-7449/cutorch/lib/THC/THCStorage.cu(30) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7449/cutorch
/lib/THC/THCGeneral.c:241Thank you very much!
—
Reply to this email directly or view it on GitHub
#6 (comment)
.
from imagenet-multigpu.torch.
Hi, sorry for the confusion. Let me explain more clearly.
I trained with VGG-A with ReLU and latest cudnn.torch binding. And here is the nvidia-smi
result:
Sun Jun 21 22:42:52 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 46C P0 153W / 225W | 4438MiB / 4799MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 49C P0 155W / 225W | 2742MiB / 4799MiB | 54% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 45C P0 146W / 225W | 2742MiB / 4799MiB | 56% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 13440 /home/archy/torch/bin/luajit 4422MiB |
| 1 13440 /home/archy/torch/bin/luajit 2726MiB |
| 2 13440 /home/archy/torch/bin/luajit 2726MiB |
| 3 13440 /home/archy/torch/bin/luajit 2726MiB |
+-----------------------------------------------------------------------------+
As you can see, the memory usage of GPU0 is much higher than the others GPUs (4438MiB vs 2742MiB). And I tried to train Alexnet with 2 GPUs with the following log:
Sun Jun 21 22:54:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 41C P0 121W / 225W | 1800MiB / 4799MiB | 91% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 41C P0 127W / 225W | 932MiB / 4799MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 40C P0 48W / 225W | 394MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 36C P0 46W / 225W | 394MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 3345 /home/archy/torch/bin/luajit 1784MiB |
| 1 3345 /home/archy/torch/bin/luajit 916MiB |
| 2 3345 /home/archy/torch/bin/luajit 378MiB |
| 3 3345 /home/archy/torch/bin/luajit 378MiB |
+-----------------------------------------------------------------------------+
GPU0(1784MiB) costs doubled memory to GPU1(916MiB).
I thought GPU0 might be the reason for the problem of run out of memory.
Thank for very much!
from imagenet-multigpu.torch.
Hi, could you try to reduce the batch size, using the commandline option:
-batchSize 32
or even reduce it to:
-batchSize 16
See if that helps.
On Sun, Jun 21, 2015 at 11:01 PM, Linchao Zhu [email protected]
wrote:
Hi, sorry for the confusion. Let me explain more clearly.
I trained with VGG-A with ReLU and latest cudnn.torch binding. And here is
the nvidia-smi result:Sun Jun 21 22:42:52 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 46C P0 153W / 225W | 4438MiB / 4799MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 46C P0 147W / 225W | 2742MiB / 4799MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 49C P0 155W / 225W | 2742MiB / 4799MiB | 54% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 45C P0 146W / 225W | 2742MiB / 4799MiB | 56% Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 13440 /home/archy/torch/bin/luajit 4422MiB |
| 1 13440 /home/archy/torch/bin/luajit 2726MiB |
| 2 13440 /home/archy/torch/bin/luajit 2726MiB |
| 3 13440 /home/archy/torch/bin/luajit 2726MiB |
+-----------------------------------------------------------------------------+As you can see, the memory usage of GPU0 is much higher than the
others GPUs (4438MiB vs 2742MiB). And I tried to train Alexnet with 2
GPUs with the following log:Sun Jun 21 22:54:24 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 0000:02:00.0 Off | 0 |
| N/A 41C P0 121W / 225W | 1800MiB / 4799MiB | 91% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20m Off | 0000:03:00.0 Off | 0 |
| N/A 41C P0 127W / 225W | 932MiB / 4799MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K20m Off | 0000:83:00.0 Off | 0 |
| N/A 40C P0 48W / 225W | 394MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K20m Off | 0000:84:00.0 Off | 0 |
| N/A 36C P0 46W / 225W | 394MiB / 4799MiB | 0% Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 3345 /home/archy/torch/bin/luajit 1784MiB |
| 1 3345 /home/archy/torch/bin/luajit 916MiB |
| 2 3345 /home/archy/torch/bin/luajit 378MiB |
| 3 3345 /home/archy/torch/bin/luajit 378MiB |
+-----------------------------------------------------------------------------+GPU0(1784MiB) costs doubled memory to GPU1(916MiB).
I thought GPU0 might be the reason for the problem of run out of memory.
Thank for very much!—
Reply to this email directly or view it on GitHub
#6 (comment)
.
from imagenet-multigpu.torch.
HI @soumith, I am together with @ffmpbgrnn. I think @ffmpbgrnn 's question is simply that
why does GPU 0 utilize double GPU memory as GPU 1 (in two GPUs condition)?
We may just ignore the out-of-memory problem firstly. The batch-size currently can fit the GPUs well and the point is about the difference between GPU 0 and other GPUs.
Another weird thing is that when set -nGPU 2
, GPU 2 and GPU 3 have memory occupation of about 400 MB.
from imagenet-multigpu.torch.
hey @zhongwen , this is because when we synchronize the weights across GPUs, we provide dedicated buffers on GPU-1 for all weights to accumulate into (so that the weight transfer from GPU{2,3,4} over to GPU1 are done in parallel and non-blocking). That's why you see GPU-1 having more memory usage.
from imagenet-multigpu.torch.
To my understanding, the total size of the model is about 500MB, So why does the buffer occupy a lot more than this?
from imagenet-multigpu.torch.
hey @zhongwen , You also have to take into account on GPU0, the Linear layers. They are only running on GPU0 and not running on GPU1. The DataParallel is only on the Convolution layers.
Does that make sense?
Essentially, you have about ~400-500MB overhead coming from the multi-GPU buffers (probably, I would say lesser than that), and you have the overhead from the Fully-connected layers and SoftMax
from imagenet-multigpu.torch.
Related Issues (20)
- "nn/Linear.lua:69: input must be vector or matrix"
- Data loading time is different across threads
- Problem with Thread ???
- Retraining an existing model(No details) HOT 1
- How can I retrain this googlenet network? HOT 2
- I used just 2 class training. But test outputs are 4 values. Why?
- Trained googlenet's outputs are negative value. Why negative? HOT 1
- Why not call cutorch.synchronizeAll()
- just want to say "Thanks to Soumith" for this useful toolbox
- how to save my image data into torch format (.t7) HOT 3
- Can not continue training
- Training stops while running main.lua , "nClasses is reported different in the data loader, and in the commandline options" HOT 1
- AlexNet Top-1 accuracy on test dataset only reach 45% when train with multiple GPUs HOT 3
- Dropout in AlexNet
- Resume training fails - weights/gradients/input tensors on different GPUs
- SqueezeNet issue HOT 2
- Could not find any image file in the given input paths
- Training time about Multi GPUs and Single GPU HOT 1
- Data loading
- testing extremely slow
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imagenet-multigpu.torch.