I get this when running generate_text.sh <div cla

It's included in our nvidia docker <a href="https://github.com/NVIDIA/Megatron-LM/blob

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No module named 'apex' about megatron-lm HOT 4 CLOSED

nvidia commented on July 21, 2024

No module named 'apex'

from megatron-lm.

Comments (4)

raulpuric commented on July 21, 2024

It's included in our nvidia docker containers

but you can also get it from here

from megatron-lm.

GrahamboJangles commented on July 21, 2024

@raulpuric

Okay, I have installed apex and all of the other dependencies but now I get this:

Generate Samples
WARNING: No training data specified
using world size: 1 and model-parallel size: 1 
 > using dynamic loss scaling
> initializing model parallel with size 1
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
100% 1042301/1042301 [00:00<00:00, 5937749.24B/s]
100% 456318/456318 [00:00<00:00, 3423447.30B/s]
prepare tokenizer done
building GPT2 model ...
 > number of parameters on model parallel rank 0: 124475904
WARNING: could not find the metadata file checkpoints/gpt2_345m/latest_checkpointed_iteration.txt 
    will not load any checkpoints and will start from random
Traceback (most recent call last):
  File "generate_samples.py", line 496, in <module>
    main()
  File "generate_samples.py", line 492, in main
    write_and_generate_samples_unconditional(model, tokenizer, args)
  File "generate_samples.py", line 319, in write_and_generate_samples_unconditional
    for datum in generate_samples_unconditional(model, tokenizer, args):
  File "generate_samples.py", line 295, in generate_samples_unconditional
    for token_stream in get_token_stream(model, copy.deepcopy(context_tokens), tokenizer, args):
  File "generate_samples.py", line 347, in get_token_stream
    tokens, attention_mask, position_ids=get_batch(context_tokens_tensor, args)
  File "generate_samples.py", line 99, in get_batch
    args.reset_attention_mask)
TypeError: get_masks_and_position_ids() missing 1 required positional argument: 'eod_mask_loss'

Do I need to train first? I'm just playing around with this but I'm not sure exactly what it's doing even after reading the documentation. Is this for training GPT-2 and BERT or is it style control? Does it somehow blend the two?

I'm running this in a Colab doc I made by the way.

from megatron-lm.