Code Monkey home page Code Monkey logo

bigscience's Introduction

bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

  1. https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
  2. https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

  • JZ - Lots of information about our work environment which helps evaluate, plan and get things done
  • Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
  • Datasets info
  • Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as:

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

Size 1B3 760M 350M 125M
C4 + low warmup a b c
OSCAR + low warmup f
C4 + high warmup e
OSCAR + high warmup d (current baseline) g h i
Pile + high warmup m j k l

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \
print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \
https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour:

perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \
print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \
https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt

bigscience's People

Contributors

cakiki avatar hugolaurencon avatar huu4ontocord avatar ibeltagy avatar mryab avatar muennighoff avatar nouamanetazi avatar pierrecolombo avatar saullu avatar stas00 avatar tevenlescao avatar thomasw21 avatar thomwolf avatar victorsanh avatar younesbelkada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigscience's Issues

Wrong tokenizer path in big model config

Hello,

The final model config seems to be pointing toward the wrong tokenizer :

--tokenizer-name-or-path bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-nfkc-250k \

@thomasw21 notified me this one was used for testing purpose only since there is already an existing dataset tokenized with this tokenizer.

This issue tracks the fact that in a later stage this should be changed to :

--tokenizer-name-or-path bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles \

@stas00

Files for bias evaluation

Can you please provide the files for the bias evaluation on the crowspairs dataset? The results are given in section 4.9 of the paper, but I do not see the files in the evaluation folder here. Thank you.

why 384(12*2*16) will be the first time all pipeline stages be filled

I am reading the content in https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/README.md#global-batch-size, but I don't quite understand some of the numbers. Can someone help me explain it?

such as:"So it'll take several days of very inefficient run. We know we get 113 TFLOPs at iteration 512, and since PP=12 and MBS=2, only at 384 12216 it'll be the first time all pipeline stages will be filled and that's when the performance should be much better, probably around 90 TFLOPs."

I can't understand why need 384(12*2*16) all pipeline stages will be filled

Why is deepspeed enabled in the Bloom training script?

Why is the value of Zero-State 0 when deepspeed is enabled in the Bloom training script? Can the Bloom model be trained and the loss curve is aligned when deepspeed is disabled? Thanks very much.

DEEPSPEED_ARGS=" \
    --deepspeed \
    --deepspeed_config ${config_json} \
    --zero-stage ${ZERO_STAGE} \
    --deepspeed-activation-checkpointing \
    "

can you share the slurm.conf you are using?

Hey,
pinging @stas00
I'm a researcher from Tel-Aviv University and were thinking about implementing QOS, similar to what you have with the Jean Zay cluster.
It would be really helpful to see the slurm.conf you are using for your QOS setting.
Thanks!
Ohad

Where can I download the training script for bloom-7b1?

Hello, the evaluation script of bloom-7b1 is found in the repo, is evaluation/results/tr11/scripts/run_trevalharness_7b1.slurm, but the training script of bloom-7b1 is not found. Can you share the bloom-7b1 training script?

Thank you very much.

mC4 sampling & pre-processing

Hi @TevenLeScao,

I think there are some confusing and broken link in the mC4 data preprocessing section. Can you take a look?

Both of the links are broken here,

  1. mc4_preprocessing
  2. mc4_sampled_raw

The original link should be,

  1. mc4_preprocessing
  2. mc4_sampled_raw

In addition to that, the multinomial data processing code to create the different language splits are in this pull request, bigscience-workshop/Megatron-DeepSpeed#9

Here's few things,

  1. Did you use this data for any one of your experiments?
  2. If not then I think you can update the doc, https://github.com/bigscience-workshop/bigscience/tree/master/data/mc4

For reference purpose, if you want to keep the code, I'm happy to open a pull request here. If not I'll close the pull request from bigscience/Megatron-Deepspeed repo.

Let me know what do you think.

Zero_Stage=1 results in higher TFLOPS?

Description

I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1.
Trying with ZeRO_STAGE=0/1
Zero_stage=1 could reduce the memory cost, but how come it increases the performance with other parameter being the same?

Nodes Size ZS DP TP PP MBS GBS Mem Sec/it TFLOPs Notes
48 181B 1 4 8 12 2 2048 37GB 120.29 134.02 02-21
48 181B 0 4 8 12 2 2048 72GB 137.34 113.02 02-21

eval opt-175B

I noticed you evaluated the opt-175B model, how did it convert to a Megatron-Deepspeed checkpoint? I can not find a 175B huggingface transformers checkpoint. Also, I can not successfully convert the opt-66B checkpoint. @thomasw21 Thanks for any reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.