Deion I modified the ende reformer config to train my own re

Training speed of NMT models about trax HOT 5 CLOSED

google commented on May 21, 2024

Training speed of NMT models

from trax.

Comments (5)

lukaszkaiser commented on May 21, 2024

This feels slow indeed. But just to know the baseline: could you provide the speed of a comparable pytorch/TF model on the same hardware with the same batch size?

from trax.

prajdabre commented on May 21, 2024

Hello,

On a v100 on similar settings using tensor2tensor, it takes less than 30 seconds per 100 batches or 0.3 seconds per batch. My guess is that, reformer has to recompute activations during backpropagation and this involves an overhead. Another thing is that reformer probably works on TPUs much faster (more compute cores or better addition-subtraction optimizations which reformer relies on). In any case, I would like to know if there is any way to speed things up.

Regards.

from trax.

lukaszkaiser commented on May 21, 2024

I think that the batches for Reformer may be much larger - so it may take longer per batch, but the difference in speed per token isn't that big? Have you checked what are the exact batch sizes in both cases, do they match?

from trax.

prajdabre commented on May 21, 2024

Hi,

Now that you mention it, I noticed that the batch sizes are 256 SENTENCES per batch instead of the default 2048 TOKENS in transformer big.

Assuming an average token size of 24 thats 6144 tokens per batch on average. So thats almost 3x larger batches. I would expect that a V100 would be able to parallellize on larger batches but I may be wrong. If you think this is what the issue is then I am satisfied. :)

from trax.

lukaszkaiser commented on May 21, 2024

I think it is, good to have clarified that :).

from trax.

Recommend Projects

Training speed of NMT models about trax HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent