Code Monkey home page Code Monkey logo

Comments (5)

tfboyd avatar tfboyd commented on August 27, 2024

I am testing a slightly older script in this tree and getting ready to test this recent version in more detail. A couple questions.

  • Which GPUs are you using?
  • Are you on a cloud service if so which one just out of curiosity?
  • Are you using the TF 1.4 RC binary builds or did you build from source? (not a big deal, just nice to know)

I am a little concerned with your number of threads I would not change those when using GPUs. It always helps to include the full log, that would show your GPUs as well as some other information to go on. I would expect TWO GPUS to scale VGG16 almost linearly, there are some caveats. I work with this script and situation all the time. If you give me some details I can help you figure it out pretty fast...I hope.

Apologies for typos, I am about to go to bed but I thought if you see this I can try to sort this out tomorrow. Also thank you for the exact command line in your post, that is really helpful and I appreciate it.

from benchmarks.

adeagle avatar adeagle commented on August 27, 2024

I have write down the details in attachment,thanks.
multi-gpu-test.txt

from benchmarks.

adeagle avatar adeagle commented on August 27, 2024

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz
stepping : 1
microcode : 0xb00001c
cpu MHz : 1388.156
cache size : 15360 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
bugs :
bogomips : 6796.54
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

from benchmarks.

tfboyd avatar tfboyd commented on August 27, 2024

from benchmarks.

tfboyd avatar tfboyd commented on August 27, 2024

@adeagle

OK I was testing on K80s and I am getting good scaling with the commands below. I strongly suggest not changing the intra or inter threads, when training on GPUs with CNNs the way we set them is recommended. Using the latest script (from head, where the args changed)

My scaling was 1GPU 36.258 to 8GPUs 271 synthetic data 1GPU 35 to 8GPUs 265 real data
~7.5x.

tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=2 --batch_size=64 --model=vgg16 --variable_update=replicated --all_reduce_spec=nccl

#OR

# VGG16 has a lot of parameters which should still be OK
tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=2 --batch_size=64 --model=vgg16 --variable_update=parameter_server --all_reduce_spec=''

Your combination is a bit odd. replicated with local_parameter_device=cpu and trying to use nccl is not really a valid combination and I am not sure what actually would happen. My guess is replicated and then the variables were aggregated by the CPU. With your intra and inter settings that might have created issues as well. The script would not now that and is designed for playing around.

In your most consistent tests you had
2 GPU total batch-size 64 = 146.38
1 GPU total batch-size 64 = 82.04

I do not usually measure scaling that way but that is a legit way to look at scaling and in this case would be 1.78x speedup. I suspect you will get better numbers with one of the command lines I posted above. And if running 1 GPU, almost always use variable_update=parameter_server and local_parameter_device=gpu. No reason to do any work off the GPU unless necessary.

Closing but feel free to add more comments I will see them.

from benchmarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.