Test Multi Gpu Acceleration ratio,is this Normal? about benchmarks HOT 5 CLOSED

tensorflow commented on August 27, 2024

Test Multi Gpu Acceleration ratio,is this Normal?

from benchmarks.

Comments (5)

tfboyd commented on August 27, 2024

I am testing a slightly older script in this tree and getting ready to test this recent version in more detail. A couple questions.

Which GPUs are you using?
Are you on a cloud service if so which one just out of curiosity?
Are you using the TF 1.4 RC binary builds or did you build from source? (not a big deal, just nice to know)

I am a little concerned with your number of threads I would not change those when using GPUs. It always helps to include the full log, that would show your GPUs as well as some other information to go on. I would expect TWO GPUS to scale VGG16 almost linearly, there are some caveats. I work with this script and situation all the time. If you give me some details I can help you figure it out pretty fast...I hope.

Apologies for typos, I am about to go to bed but I thought if you see this I can try to sort this out tomorrow. Also thank you for the exact command line in your post, that is really helpful and I appreciate it.

from benchmarks.

adeagle commented on August 27, 2024

I have write down the details in attachment,thanks.
multi-gpu-test.txt

from benchmarks.

adeagle commented on August 27, 2024

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz
stepping : 1
microcode : 0xb00001c
cpu MHz : 1388.156
cache size : 15360 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
bugs :
bogomips : 6796.54
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

from benchmarks.

tfboyd commented on August 27, 2024

Cool that log helps a lot. Hopefully I will have something before the end of the day. I know VGG16 scales ok as I was testing on K80s this weekend but I was not using the latest script.

…

On Mon, Oct 16, 2017 at 1:10 AM, adeagle ***@***.***> wrote: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz stepping : 1 microcode : 0xb00001c cpu MHz : 1388.156 cache size : 15360 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 6 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 20 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts bugs : bogomips : 6796.54 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#67 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWZesnkQNS0HoO2leyMYF_ZaUCa0LPvsks5ssw-BgaJpZM4P6BRi> .

from benchmarks.

tfboyd commented on August 27, 2024

@adeagle

OK I was testing on K80s and I am getting good scaling with the commands below. I strongly suggest not changing the intra or inter threads, when training on GPUs with CNNs the way we set them is recommended. Using the latest script (from head, where the args changed)

My scaling was 1GPU 36.258 to 8GPUs 271 synthetic data 1GPU 35 to 8GPUs 265 real data
~7.5x.

tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=2 --batch_size=64 --model=vgg16 --variable_update=replicated --all_reduce_spec=nccl

#OR

# VGG16 has a lot of parameters which should still be OK
tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=2 --batch_size=64 --model=vgg16 --variable_update=parameter_server --all_reduce_spec=''

Your combination is a bit odd. replicated with local_parameter_device=cpu and trying to use nccl is not really a valid combination and I am not sure what actually would happen. My guess is replicated and then the variables were aggregated by the CPU. With your intra and inter settings that might have created issues as well. The script would not now that and is designed for playing around.

In your most consistent tests you had
2 GPU total batch-size 64 = 146.38
1 GPU total batch-size 64 = 82.04

I do not usually measure scaling that way but that is a legit way to look at scaling and in this case would be 1.78x speedup. I suspect you will get better numbers with one of the command lines I posted above. And if running 1 GPU, almost always use variable_update=parameter_server and local_parameter_device=gpu. No reason to do any work off the GPU unless necessary.

Closing but feel free to add more comments I will see them.

from benchmarks.

Test Multi Gpu Acceleration ratio,is this Normal? about benchmarks HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent