I run the followed this: python tf_

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No performance improved on batch 128 ? about benchmarks HOT 5 CLOSED

tensorflow commented on July 24, 2024

No performance improved on batch 128 ?

from benchmarks.

Comments (5)

ekelsen commented on July 24, 2024

The underlying convolution routines won't get any faster when the batch_size goes from 64 -> 128, so it isn't surprising that the overall training doesn't either.

from benchmarks.

ilovechai commented on July 24, 2024

@zhaoerchao @ekelsen These benchmark programs are giving for example say 570images/sec where as when you run the same model normally it gives half of that of the benchmark programs gave, why so?

from benchmarks.

zhaoerchao commented on July 24, 2024

@cryptox31 Do you run the program on the same GPU with the same version TF?

from benchmarks.

ilovechai commented on July 24, 2024

@zhaoerchao I am currently using 4 Tesla P100 GPUs and running Tensorflow 1.01 inception v3 model, and I am not getting optimum results.

from benchmarks.

tfboyd commented on July 24, 2024

Increasing the batch-size will not always increase performance. I am far from an expert. From my testing, I find that each model and hardware combination will have a point where even if there is more memory, increasing batch size is does not help. Increasing batch size normally helps when the step time is very fast and increasing the batch-size slows down the step time enough to hide the transfer times and other calculations that are impossible to "hide" with a very fast step time. I know that is not a very technical explanation. One good example of this is notice that "everyone" runs alexnet at a batch size of 512 or more now but use to run much small batches. I have not been working with ML very long but if you test alexnet with 32, 128, 256, and then 512 on most ML platforms you will see a significant speedup as the batch-size increases. If I remember correctly, even more so on multi-GPU.

Finally, the goal is normally to converge at the best possible top_1. I know people are training with large total batches for ResNet but I have not seen anyone training with 128 per GPU. Of course there is so much happening it likely has happened and I did not see it.

Closing as this is kind of expected. If you are having unexpected results with batch-size 64 or 32 please let me know and I will see if I can figure it out.

from benchmarks.

Recommend Projects

No performance improved on batch 128 ? about benchmarks HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent