Code Monkey home page Code Monkey logo

Comments (5)

raulpuric avatar raulpuric commented on July 21, 2024 1

Hmmm that doesn't seem right.

I just ran this and got the correct result:
python3 evaluate_gpt2.py --model-parallel-size 1 --num-layers 12 --hidden-size 768 --log-interval 100 --load anything --eval-batch-size 16 --num-attention-heads 12 --seq-length 1024 --max-position-embeddings 1024 --tokenizer-type GPT2BPETokenizer --text-key text --distributed-backend nccl --hidden-dropout 0.1 --attention-dropout 0.1 --fp16 --overlapping-eval 32 --cache-dir cache --load-openai --valid-data ../gpt2_staging/eval_datasets/wikitext-103/wiki.test.tokens

This gives me

 validation results on wiki | avg loss: 3.1057E+00 | ppl: 2.2326E+01 | adjusted ppl: 3.0537E+01 | token ratio: 1.1008449901248143 |

Also for the future I think you meant to run scripts/run_gpt2_eval.py. We just realized there's a line missing from this, but it should work after I patch it in a sec.

from megatron-lm.

raulpuric avatar raulpuric commented on July 21, 2024 1

I think the arguments you're missing are --num-layers 12 --num-attention-heads 12

from megatron-lm.

raulpuric avatar raulpuric commented on July 21, 2024 1

fixed the script in case you're interested

from megatron-lm.

mschrimpf avatar mschrimpf commented on July 21, 2024

Indeed, thank you for clarifying. I was copying the command that run_gpt2_eval.py was executing which was missing those arguments as you said.

from megatron-lm.

CatFootPrint avatar CatFootPrint commented on July 21, 2024

Whew I evaluated the GPT-2

python evaluate_gpt2.py --model-parallel-size=1 --num-layers=12 --hidden-size=768 --vocab-size=50257 --log-interval=1000 --load=anything --eval-batch-size=16 --num-attention-heads=12 --seq-length=1024 --max-position-embeddings=1024 --tokenizer-type=GPT2BPETokenizer --text-key=text --distributed-backend=nccl --hidden-dropout=0.1 --attention-dropout=0.1 --fp16 --overlapping-eval=32 --cache-dir=cache --load-openai
--valid-data=/data2/z00487393/Documents/Datasets/Wikipedia/wikitext-2-v1/wikitext-2/wiki.test.tokens

The result is as follows

Evaluate GPT2 model
WARNING: No training data specified
using world size: 1 and model-parallel size: 1 
 > using dynamic loss scaling
> initializing model parallel with size 1
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
wikitext
Original Tokens: 528983, Detokenized tokens: 245566
> padded vocab (size: 50257) with 0 dummy tokens (new size: 50257)
global rank: 0 | vocab size: 50257 | eod token: 50256 | num_examples: 16531 | num_original_tokens: 245566 | num_tokenized_tokens: 528983
building GPT2 model ...
 > number of parameters: 124439808
loading openai weights
model.cpu()
Traceback (most recent call last):
  File "/data2/z00487393/Applications/Anaconda/envs/ML_pytorch/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-6f4706f7ea66>", line 4, in <module>
    runfile('/data2/z00487393/Documents/Scripts/PyTorch/Megatron/Megatron-LM-master/evaluate_gpt2.py', args=['--model-parallel-size=1', '--num-layers=12', '--hidden-size=768', '--vocab-size', '50257', '--log-interval=1000', '--load=anything', '--eval-batch-size=16', '--num-attention-heads=12', '--seq-length=1024', '--max-position-embeddings=1024', '--tokenizer-type=GPT2BPETokenizer', '--text-key=text', '--distributed-backend=nccl', '--hidden-dropout=0.1', '--attention-dropout=0.1', '--fp16', '--overlapping-eval=32', '--cache-dir=cache', '--load-openai', '--valid-data=/data2/z00487393/Documents/Datasets/Wikipedia/wikitext-2-v1/wikitext-2/wiki.test.tokens'], wdir='/data2/z00487393/Documents/Scripts/PyTorch/Megatron/Megatron-LM-master')
  File "/data2/z00487393/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/data2/z00487393/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/data2/z00487393/Documents/Scripts/PyTorch/Megatron/Megatron-LM-master/evaluate_gpt2.py", line 574, in <module>
    main()
  File "/data2/z00487393/Documents/Scripts/PyTorch/Megatron/Megatron-LM-master/evaluate_gpt2.py", line 557, in main
    gpt2model = GPT2LMHeadModel.from_pretrained(model_path, cache_dir='gpt2_weights')
  File "/data2/z00487393/Applications/Anaconda/envs/ML_pytorch/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling_gpt2.py", line 423, in from_pretrained
    state_dict = torch.load(resolved_archive_file, map_location='cpu')
  File "/data2/z00487393/Applications/Anaconda/envs/ML_pytorch/lib/python3.6/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/data2/z00487393/Applications/Anaconda/envs/ML_pytorch/lib/python3.6/site-packages/torch/serialization.py", line 573, in _load
    result = unpickler.load()
_pickle.UnpicklingError: invalid load key, '5'.

Thank you very much for your guidance.

from megatron-lm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.