Code Monkey home page Code Monkey logo

Comments (4)

sunpengsdu avatar sunpengsdu commented on May 11, 2024

Hi @crazyofapple , can you provide more details about your platform? In our platform, we use up to 128 GPU nodes connected by 4*100Gbps RoCE, and each node has 8 GPUs connected by NVLINK.

from internlm.

crazyofapple avatar crazyofapple commented on May 11, 2024

Inter: 2 HDR100 IB 200G, Intra: 8 gpus w/ PCIE

from internlm.

sunpengsdu avatar sunpengsdu commented on May 11, 2024

The main performance bottleneck is the intra-node communication via PCIE. We did two experiments:

  1. On a single GPU node with NVLINK. The training log is following:

2023-07-10 14:26:28,977 INFO train.py:317 in record_current_batch_training_metrics -- tflops=188.02533140299252,step=36,loss=5.459033012390137,tgs (tokens/gpu/second)=4233.73,lr=7.6e-06,loss_scale=65536.0,grad_norm=12.540833573326264,micro_num=4,num_consumed_tokens=4849664,inf_nan_skip_batches=0,num_samples_in_batch=15,largest_length=2048,largest_batch=5,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=3.72

  1. On a single GPU node without MVLINK. The training log is following:

2023-07-10 14:34:49,024 INFO train.py:317 in record_current_batch_training_metrics -- tflops=99.1021732624673,step=18,loss=6.766777038574219,tgs (tokens/gpu/second)=2231.46,lr=4.000000000000001e-06,loss_scale=65536.0,grad_norm=12.957902089555239,micro_num=4,num_consumed_tokens=2490368,inf_nan_skip_batches=0,num_samples_in_batch=15,largest_length=2048,largest_batch=5,smallest_batch=3,adam_beta2=0.95,fwd_bwd_time=5.76

Since the optimizer needs a lot of allreduce/broadcast communication, it is quite important to ensure high communication bandwidth between GPUs in a node.

from internlm.

crazyofapple avatar crazyofapple commented on May 11, 2024

thx

from internlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.