Code Monkey home page Code Monkey logo

Comments (3)

lchu-ibm avatar lchu-ibm commented on August 29, 2024

Hi, if you get same throughput, then MFU should be same/similar.

For an easier calculation on MFU:

Leveraging the formula we can get model TFLOP for roughly 192 for 7b model per one bs. We run a batch size of 2, thus each step is running 192 * 2 TFLOP, and our step time is 2.2s (I assume you should have something same/similar given you observed same throughput), so we can get Model TFLOPS = 192 * 2 / 2.2 = 175.

MFU is calculated by the ratio of this number with theoretical peak (for A100, this was 312), so 175/312 = 0.56

from fms-fsdp.

jasonkrone avatar jasonkrone commented on August 29, 2024

Thank you, Linsong! Makes sense - must be a measurement error on my part. Appreciate the quick response :)

from fms-fsdp.

jasonkrone avatar jasonkrone commented on August 29, 2024

Noting why my measurement was wrong on the off chance it saves others time.

When you do sum([p.numel() for p in model.parameters()]) for an FSDP model, it only counts the parameters that are in the local FSDP shard. To count the full number of parameters in the FSDP model you can use:

with FSDP.summon_full_params(model):
    n_params = sum([p.numel() for p in model.parameters()])

from fms-fsdp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.