Hi, Do you know why the results showed that “nGPU=4” is still slower

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

In my experience, you can try: use multi-threads in DataParall

nGPU=4 is slower than nGPU=1 about imagenet-multigpu.torch HOT 14 CLOSED

soumith commented on August 11, 2024

nGPU=4 is slower than nGPU=1

from imagenet-multigpu.torch.

Comments (14)

buttomnutstoast commented on August 11, 2024

We have confronted similar problems that using multi-GPU is much slower than single GPU

from imagenet-multigpu.torch.

buttomnutstoast commented on August 11, 2024

OK, we find that package contributors have already fixed the problem which is mentioned in this post

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

hi, sorry about that, yes it should be fixed in the latest commit.

from imagenet-multigpu.torch.

chienlinhuang1116 commented on August 11, 2024

Hi, I tested it and results still showed that “nGPU=4” is slower than “nGPU=1”. Do you have any comment? Thank you.

th main.lua -data ~/imagenet -nGPU 1 -batchSize 128
Epoch: [1][2/10000] Time 0.844 Err 6.9071 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][3/10000] Time 0.845 Err 6.9084 Top1-%: 0.00 LR 1e-02 DataLoadingTime 3.041
Epoch: [1][4/10000] Time 0.845 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.051
Epoch: [1][5/10000] Time 0.845 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 2.557
Epoch: [1][6/10000] Time 0.843 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.003

th main.lua -data ~/imagenet -nGPU 4 -batchSize 128
Epoch: [1][2/10000] Time 1.781 Err 6.9064 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][3/10000] Time 1.765 Err 6.9066 Top1-%: 0.00 LR 1e-02 DataLoadingTime 2.181
Epoch: [1][4/10000] Time 1.761 Err 6.9080 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.004
Epoch: [1][5/10000] Time 1.760 Err 6.9089 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.699
Epoch: [1][6/10000] Time 1.763 Err 6.9058 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.004

th main.lua -data ~/imagenet -nGPU 4 -batchSize 256
Epoch: [1][2/10000] Time 2.479 Err 6.9081 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.004
Epoch: [1][3/10000] Time 2.421 Err 6.9074 Top1-%: 0.00 LR 1e-02 DataLoadingTime 3.012
Epoch: [1][4/10000] Time 2.369 Err 6.9066 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.107
Epoch: [1][5/10000] Time 2.368 Err 6.9078 Top1-%: 0.00 LR 1e-02 DataLoadingTime 3.725
Epoch: [1][6/10000] Time 2.368 Err 6.9079 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.005

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

@chienlinhuang1116 what commit hash are you on?

from imagenet-multigpu.torch.

chienlinhuang1116 commented on August 11, 2024

Thank you soumith, I think it should be the latest commit hash.

(1) You are right the problem had already fixed. When I work on AWS Ubuntu instances with fbcunn libs installation, it is correct and multi-GPU is much faster than single GPU.

(2) Because the latest imagenet examples seems not working with any fbcunn libs, I tested it on CentOS GPU machines which only installing Torch7 and nn packages. In this case, multi-GPU is still slower than single GPU.

from imagenet-multigpu.torch.

soumith commented on August 11, 2024

in the case of (2), did you install torch freshly? because the latest nn / cunn + the latest commit hash of this repo does not have the slowness anymore.

from imagenet-multigpu.torch.

chienlinhuang1116 commented on August 11, 2024

Hi, I install torch freshly using following steps, but multi-GPU is still slower than single GPU on CentOS GPU machines. Do you have any idea?

Thank you.

curl -s https://raw.githubusercontent.com/torch/ezinstall/master/clean-old.sh | bash
git clone https://github.com/torch/distro.git ~/torch --recursive
cd /torch; ./install.sh
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-luajit+torch | PREFIX=/torch bash

from imagenet-multigpu.torch.

arashno commented on August 11, 2024

Hi
I have the exact same problem.
When I use nGPU=2 it is slower than just one GPU.
I installed the latest version of Torch, cunn, and this repository.
Everything is up to date but still it is slower.
Any Ideas? @soumith @chienlinhuang1116 @buttomnutstoast
Here are my outputs:

for 1 GPU:
==> doing epoch on training data:
==> online epoch # 1
Epoch: [1][1/9000] Time 0.826 Err 3.8455 Top1-%: 3.12 Topn-%: 7.81 LR 1e-02 DataLoadingTime 5.466
Epoch: [1][2/9000] Time 0.709 Err 3.6204 Top1-%: 17.97 Topn-%: 36.72 LR 1e-02 DataLoadingTime 0.072
Epoch: [1][3/9000] Time 0.701 Err 3.1000 Top1-%: 30.47 Topn-%: 65.62 LR 1e-02 DataLoadingTime 0.009
Epoch: [1][4/9000] Time 0.719 Err 2.7807 Top1-%: 27.34 Topn-%: 61.72 LR 1e-02 DataLoadingTime 0.005
Epoch: [1][5/9000] Time 0.687 Err 2.8056 Top1-%: 25.78 Topn-%: 63.28 LR 1e-02 DataLoadingTime 1.749
Epoch: [1][6/9000] Time 0.715 Err 3.1090 Top1-%: 19.53 Topn-%: 55.47 LR 1e-02 DataLoadingTime 0.005
Epoch: [1][7/9000] Time 0.719 Err 2.7177 Top1-%: 25.78 Topn-%: 65.62 LR 1e-02 DataLoadingTime 0.011
Epoch: [1][8/9000] Time 0.676 Err 2.9563 Top1-%: 17.97 Topn-%: 60.16 LR 1e-02 DataLoadingTime 0.010

for two GPUs:

==> doing epoch on training data:
==> online epoch # 1
Epoch: [1][1/9000] Time 4.474 Err 3.8503 Top1-%: 1.56 Topn-%: 9.38 LR 1e-02 DataLoadingTime 6.425
Epoch: [1][2/9000] Time 2.692 Err 3.5693 Top1-%: 23.44 Topn-%: 42.97 LR 1e-02 DataLoadingTime 0.005
Epoch: [1][3/9000] Time 2.539 Err 3.2223 Top1-%: 28.12 Topn-%: 57.81 LR 1e-02 DataLoadingTime 0.022
Epoch: [1][4/9000] Time 2.511 Err 3.0643 Top1-%: 25.78 Topn-%: 57.81 LR 1e-02 DataLoadingTime 0.019
Epoch: [1][5/9000] Time 2.500 Err 2.8987 Top1-%: 31.25 Topn-%: 60.16 LR 1e-02 DataLoadingTime 0.024
Epoch: [1][6/9000] Time 2.497 Err 3.2392 Top1-%: 23.44 Topn-%: 55.47 LR 1e-02 DataLoadingTime 0.020
Epoch: [1][7/9000] Time 2.494 Err 2.8436 Top1-%: 21.88 Topn-%: 63.28 LR 1e-02 DataLoadingTime 0.023
Epoch: [1][8/9000] Time 2.499 Err 2.7006 Top1-%: 22.66 Topn-%: 65.62 LR 1e-02 DataLoadingTime 0.015
Epoch: [1][9/9000] Time 2.493 Err 2.9153 Top1-%: 17.19 Topn-%: 60.16 LR 1e-02 DataLoadingTime 0.021
Epoch: [1][10/9000] Time 2.503 Err 2.7242 Top1-%: 21.09 Topn-%: 62.50 LR 1e-02 DataLoadingTime 0.019

from imagenet-multigpu.torch.

buttomnutstoast commented on August 11, 2024

In my experience, you can try:

use multi-threads in DataParallelTable
call model:getParameter() before model:forward()
see if nccl is well installed
maybe hardware issues...

from imagenet-multigpu.torch.

arashno commented on August 11, 2024

@buttomnutstoast
Thanks for your answer.
I hadn't NCCL, so I installed it.
The installation seems fine.
Then I replace util.lua:10 by this:

 model = nn.DataParallelTable(1):threads(function()
           require 'cudnn'
         end)

Nothing happened in this case still, it is very slow.

Then I tried to use NCCL by using this:

 model = nn.DataParallelTable(1,true,true):threads(function()
           require 'cudnn'
         end)

But this time the program just stops and does nothing!
any idea?

from imagenet-multigpu.torch.

buttomnutstoast commented on August 11, 2024

Did you install C++ source code of NCCL? The module installed from luarocks is simply an interface.

from imagenet-multigpu.torch.

arashno commented on August 11, 2024

Yes, I installed the C++ source.
I also made using of nccl in two steps, but no difference.

model = nn.DataParallelTable(1,true,true)
model:threads(function()
require 'cudnn'
end)

It just does nothing, it seems that it trapped in a deadlock.

from imagenet-multigpu.torch.

arashno commented on August 11, 2024

I had an old installation of Torch.
It seems that it was causing the problem.
When I removed it, it works fine now.
Thanks

from imagenet-multigpu.torch.

nGPU=4 is slower than nGPU=1 about imagenet-multigpu.torch HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent