Code Monkey home page Code Monkey logo

Comments (14)

buttomnutstoast avatar buttomnutstoast commented on August 11, 2024

We have confronted similar problems that using multi-GPU is much slower than single GPU

from imagenet-multigpu.torch.

buttomnutstoast avatar buttomnutstoast commented on August 11, 2024

OK, we find that package contributors have already fixed the problem which is mentioned in this post

from imagenet-multigpu.torch.

soumith avatar soumith commented on August 11, 2024

hi, sorry about that, yes it should be fixed in the latest commit.

from imagenet-multigpu.torch.

chienlinhuang1116 avatar chienlinhuang1116 commented on August 11, 2024

Hi, I tested it and results still showed that “nGPU=4” is slower than “nGPU=1”. Do you have any comment? Thank you.

th main.lua -data ~/imagenet -nGPU 1 -batchSize 128
Epoch: [1][2/10000] Time 0.844 Err 6.9071 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][3/10000] Time 0.845 Err 6.9084 Top1-%: 0.00 LR 1e-02 DataLoadingTime 3.041
Epoch: [1][4/10000] Time 0.845 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.051
Epoch: [1][5/10000] Time 0.845 Err 6.9092 Top1-%: 0.00 LR 1e-02 DataLoadingTime 2.557
Epoch: [1][6/10000] Time 0.843 Err 6.9095 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.003

th main.lua -data ~/imagenet -nGPU 4 -batchSize 128
Epoch: [1][2/10000] Time 1.781 Err 6.9064 Top1-%: 0.78 LR 1e-02 DataLoadingTime 0.002
Epoch: [1][3/10000] Time 1.765 Err 6.9066 Top1-%: 0.00 LR 1e-02 DataLoadingTime 2.181
Epoch: [1][4/10000] Time 1.761 Err 6.9080 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.004
Epoch: [1][5/10000] Time 1.760 Err 6.9089 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.699
Epoch: [1][6/10000] Time 1.763 Err 6.9058 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.004

th main.lua -data ~/imagenet -nGPU 4 -batchSize 256
Epoch: [1][2/10000] Time 2.479 Err 6.9081 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.004
Epoch: [1][3/10000] Time 2.421 Err 6.9074 Top1-%: 0.00 LR 1e-02 DataLoadingTime 3.012
Epoch: [1][4/10000] Time 2.369 Err 6.9066 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.107
Epoch: [1][5/10000] Time 2.368 Err 6.9078 Top1-%: 0.00 LR 1e-02 DataLoadingTime 3.725
Epoch: [1][6/10000] Time 2.368 Err 6.9079 Top1-%: 0.00 LR 1e-02 DataLoadingTime 0.005

from imagenet-multigpu.torch.

soumith avatar soumith commented on August 11, 2024

@chienlinhuang1116 what commit hash are you on?

from imagenet-multigpu.torch.

chienlinhuang1116 avatar chienlinhuang1116 commented on August 11, 2024

Thank you soumith, I think it should be the latest commit hash.

(1) You are right the problem had already fixed. When I work on AWS Ubuntu instances with fbcunn libs installation, it is correct and multi-GPU is much faster than single GPU.

(2) Because the latest imagenet examples seems not working with any fbcunn libs, I tested it on CentOS GPU machines which only installing Torch7 and nn packages. In this case, multi-GPU is still slower than single GPU.

from imagenet-multigpu.torch.

soumith avatar soumith commented on August 11, 2024

in the case of (2), did you install torch freshly? because the latest nn / cunn + the latest commit hash of this repo does not have the slowness anymore.

from imagenet-multigpu.torch.

chienlinhuang1116 avatar chienlinhuang1116 commented on August 11, 2024

Hi, I install torch freshly using following steps, but multi-GPU is still slower than single GPU on CentOS GPU machines. Do you have any idea?

Thank you.

curl -s https://raw.githubusercontent.com/torch/ezinstall/master/clean-old.sh | bash
git clone https://github.com/torch/distro.git ~/torch --recursive
cd /torch; ./install.sh
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-luajit+torch | PREFIX=
/torch bash

from imagenet-multigpu.torch.

arashno avatar arashno commented on August 11, 2024

Hi
I have the exact same problem.
When I use nGPU=2 it is slower than just one GPU.
I installed the latest version of Torch, cunn, and this repository.
Everything is up to date but still it is slower.
Any Ideas? @soumith @chienlinhuang1116 @buttomnutstoast
Here are my outputs:

for 1 GPU:
==> doing epoch on training data:
==> online epoch # 1
Epoch: [1][1/9000] Time 0.826 Err 3.8455 Top1-%: 3.12 Topn-%: 7.81 LR 1e-02 DataLoadingTime 5.466
Epoch: [1][2/9000] Time 0.709 Err 3.6204 Top1-%: 17.97 Topn-%: 36.72 LR 1e-02 DataLoadingTime 0.072
Epoch: [1][3/9000] Time 0.701 Err 3.1000 Top1-%: 30.47 Topn-%: 65.62 LR 1e-02 DataLoadingTime 0.009
Epoch: [1][4/9000] Time 0.719 Err 2.7807 Top1-%: 27.34 Topn-%: 61.72 LR 1e-02 DataLoadingTime 0.005
Epoch: [1][5/9000] Time 0.687 Err 2.8056 Top1-%: 25.78 Topn-%: 63.28 LR 1e-02 DataLoadingTime 1.749
Epoch: [1][6/9000] Time 0.715 Err 3.1090 Top1-%: 19.53 Topn-%: 55.47 LR 1e-02 DataLoadingTime 0.005
Epoch: [1][7/9000] Time 0.719 Err 2.7177 Top1-%: 25.78 Topn-%: 65.62 LR 1e-02 DataLoadingTime 0.011
Epoch: [1][8/9000] Time 0.676 Err 2.9563 Top1-%: 17.97 Topn-%: 60.16 LR 1e-02 DataLoadingTime 0.010

for two GPUs:

==> doing epoch on training data:
==> online epoch # 1
Epoch: [1][1/9000] Time 4.474 Err 3.8503 Top1-%: 1.56 Topn-%: 9.38 LR 1e-02 DataLoadingTime 6.425
Epoch: [1][2/9000] Time 2.692 Err 3.5693 Top1-%: 23.44 Topn-%: 42.97 LR 1e-02 DataLoadingTime 0.005
Epoch: [1][3/9000] Time 2.539 Err 3.2223 Top1-%: 28.12 Topn-%: 57.81 LR 1e-02 DataLoadingTime 0.022
Epoch: [1][4/9000] Time 2.511 Err 3.0643 Top1-%: 25.78 Topn-%: 57.81 LR 1e-02 DataLoadingTime 0.019
Epoch: [1][5/9000] Time 2.500 Err 2.8987 Top1-%: 31.25 Topn-%: 60.16 LR 1e-02 DataLoadingTime 0.024
Epoch: [1][6/9000] Time 2.497 Err 3.2392 Top1-%: 23.44 Topn-%: 55.47 LR 1e-02 DataLoadingTime 0.020
Epoch: [1][7/9000] Time 2.494 Err 2.8436 Top1-%: 21.88 Topn-%: 63.28 LR 1e-02 DataLoadingTime 0.023
Epoch: [1][8/9000] Time 2.499 Err 2.7006 Top1-%: 22.66 Topn-%: 65.62 LR 1e-02 DataLoadingTime 0.015
Epoch: [1][9/9000] Time 2.493 Err 2.9153 Top1-%: 17.19 Topn-%: 60.16 LR 1e-02 DataLoadingTime 0.021
Epoch: [1][10/9000] Time 2.503 Err 2.7242 Top1-%: 21.09 Topn-%: 62.50 LR 1e-02 DataLoadingTime 0.019

from imagenet-multigpu.torch.

buttomnutstoast avatar buttomnutstoast commented on August 11, 2024

In my experience, you can try:

  1. use multi-threads in DataParallelTable
  2. call model:getParameter() before model:forward()
  3. see if nccl is well installed
  4. maybe hardware issues...

from imagenet-multigpu.torch.

arashno avatar arashno commented on August 11, 2024

@buttomnutstoast
Thanks for your answer.
I hadn't NCCL, so I installed it.
The installation seems fine.
Then I replace util.lua:10 by this:

 model = nn.DataParallelTable(1):threads(function()
           require 'cudnn'
         end)

Nothing happened in this case still, it is very slow.

Then I tried to use NCCL by using this:

 model = nn.DataParallelTable(1,true,true):threads(function()
           require 'cudnn'
         end)

But this time the program just stops and does nothing!
any idea?

from imagenet-multigpu.torch.

buttomnutstoast avatar buttomnutstoast commented on August 11, 2024

Did you install C++ source code of NCCL? The module installed from luarocks is simply an interface.

from imagenet-multigpu.torch.

arashno avatar arashno commented on August 11, 2024

Yes, I installed the C++ source.
I also made using of nccl in two steps, but no difference.

model = nn.DataParallelTable(1,true,true)
model:threads(function()
require 'cudnn'
end)

It just does nothing, it seems that it trapped in a deadlock.

from imagenet-multigpu.torch.

arashno avatar arashno commented on August 11, 2024

I had an old installation of Torch.
It seems that it was causing the problem.
When I removed it, it works fine now.
Thanks

from imagenet-multigpu.torch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.