Code Monkey home page Code Monkey logo

Comments (5)

lukaszkaiser avatar lukaszkaiser commented on May 21, 2024

This feels slow indeed. But just to know the baseline: could you provide the speed of a comparable pytorch/TF model on the same hardware with the same batch size?

from trax.

prajdabre avatar prajdabre commented on May 21, 2024

Hello,

On a v100 on similar settings using tensor2tensor, it takes less than 30 seconds per 100 batches or 0.3 seconds per batch. My guess is that, reformer has to recompute activations during backpropagation and this involves an overhead. Another thing is that reformer probably works on TPUs much faster (more compute cores or better addition-subtraction optimizations which reformer relies on). In any case, I would like to know if there is any way to speed things up.

Regards.

from trax.

lukaszkaiser avatar lukaszkaiser commented on May 21, 2024

I think that the batches for Reformer may be much larger - so it may take longer per batch, but the difference in speed per token isn't that big? Have you checked what are the exact batch sizes in both cases, do they match?

from trax.

prajdabre avatar prajdabre commented on May 21, 2024

Hi,

Now that you mention it, I noticed that the batch sizes are 256 SENTENCES per batch instead of the default 2048 TOKENS in transformer big.

Assuming an average token size of 24 thats 6144 tokens per batch on average. So thats almost 3x larger batches. I would expect that a V100 would be able to parallellize on larger batches but I may be wrong. If you think this is what the issue is then I am satisfied. :)

from trax.

lukaszkaiser avatar lukaszkaiser commented on May 21, 2024

I think it is, good to have clarified that :).

from trax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.