When I load the int4 model, I get the following error; The run command is: bash s

Add --from-quantized-checkpoint to your s will

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

You should first use the <a href="https://github.com/THUDM/GLM-130B/blob/main/tools/co

I have the same problem and this is the error i am getting <div class="snippet-cli

Mismatch error when load int4 model about glm-130b HOT 7 CLOSED

thudm commented on July 20, 2024

Mismatch error when load int4 model

from glm-130b.

Comments (7)

Sengxian commented on July 20, 2024 2

Add --from-quantized-checkpoint to your scripts will solve the error.

from glm-130b.

980202006 commented on July 20, 2024

Thank you for the reply.But I still have the same problem.Due to memory issues, The run command is: bash scripts/generate.sh --input-source input.txt --from-quantized-checkpoint --sequential-initialization

from glm-130b.

980202006 commented on July 20, 2024

WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
WARNING: No training data specified
WARNING: No training data specified
using world size: 2 and model-parallel size: 2 
> padded vocab (size: 150528) with 0 dummy tokens (new size: 150528)
> initializing model parallel with size 2
> Set tokenizer as a icetk-glm-130B tokenizer! Now you can get_tokenizer() everywhere.
Namespace(num_layers=70, hidden_size=12288, num_attention_heads=96, vocab_size=150528, max_sequence_length=2048, layernorm_order='post', inner_hidden_size=32768, hidden_size_per_attention_head=None, model_parallel_size=2, skip_init=True, use_gpu_initialization=False, layernorm_epsilon=1e-05, hidden_dropout=0.1, attention_dropout=0.1, make_vocab_size_divisible_by=128, sandwich_ln=False, experiment_name='MyModel', train_iters=1000000, batch_size=4, lr=0.0001, mode='inference', seed=1234, zero_stage=0, checkpoint_activations=False, checkpoint_num_layers=1, fp16=True, bf16=False, gradient_accumulation_steps=1, epochs=None, log_interval=50, summary_dir='', save_args=False, lr_decay_iters=None, lr_decay_style='linear', lr_decay_ratio=0.1, warmup=0.01, weight_decay=0.01, save=None, load='./glm-130b-sat', save_interval=5000, no_save_rng=False, no_load_rng=False, resume_dataloader=False, distributed_backend='nccl', local_rank=0, exit_interval=None, eval_batch_size=None, eval_iters=100, eval_interval=None, strict_eval=False, train_data=None, train_data_weights=None, iterable_dataset=False, valid_data=None, test_data=None, split='1000,1,1', num_workers=1, block_size=10000, tokenizer_type='icetk-glm-130B', temperature=0.9, top_p=0.0, top_k=1, num_beams=4, length_penalty=1.0, no_repeat_ngram_size=3, min_tgt_length=0, out_seq_length=256, input_source='input.txt', output_path='samples', with_id=False, max_inference_batch_size=12, device=0, deepspeed=False, deepspeed_config=None, deepscale=False, deepscale_config=None, deepspeed_mpi=False, cuda=True, rank=0, world_size=2, master_ip='127.0.0.1', master_port='29500', bminf=False, bminf_memory_limit=44, quantization_bit_width=4, from_quantized_checkpoint=True, sequential_initialization=True, sampling_strategy='BeamSearchStrategy', min_gen_length=0, print_all_beams=False, do_train=False)
> Quantizing model weight to 4 bits
global rank 0 is loading checkpoint ./glm-130b-sat/49300/mp_rank_00_model_states.pt
Traceback (most recent call last):
  File "/ssd1/xingyum/GLM-130B/generate.py", line 210, in <module>
    main(args)
  File "/ssd1/xingyum/GLM-130B/generate.py", line 156, in main
    model, tokenizer = initialize_model_and_tokenizer(args)
  File "/ssd1/xingyum/GLM-130B/initialize.py", line 72, in initialize_model_and_tokenizer
    load_checkpoint(model, args)
  File "/home/xingyum/anaconda3/envs/vis/lib/python3.10/site-packages/SwissArmyTransformer/training/model_io.py", line 181, in load_checkpoint
    missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
  File "/home/xingyum/anaconda3/envs/vis/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GLM130B:
        size mismatch for transformer.word_embeddings.weight: copying a param with shape torch.Size([18816, 12288]) from checkpoint, the shape in current model is torch.Size([75264, 12288]).
        size mismatch for transformer.layers.0.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([18432]).
        size mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([18432, 6144]).

from glm-130b.

Sengxian commented on July 20, 2024

You should first use the checkpoint conversion script to convert the checkpoint to a 2-way-tensor-parallel style.

from glm-130b.

980202006 commented on July 20, 2024

Thank you.Is there a limit on memory usage and can complete the convert command? My machine only has more than 160 memory available.

from glm-130b.

Xiao9905 commented on July 20, 2024

There is an argument to allow sequential conversion with small memory budget in the conversion script. If you have more questions, please feel free to reopen the issue.

from glm-130b.

MohamedAliRashad commented on July 20, 2024

I have the same problem and this is the error i am getting

size mismatch for transformer.layers.69.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 12288]) from checkpoint, the shape in current model is torch.Size([32768, 6144]).

I am using MB_SIZE=2 because i have only 2 A6000 available and i added --from-quantized-checkpoint to the args on the generate command (I also transformed the model to 4 bit using the convert_tp.py file.

@Xiao9905 @Sengxian

from glm-130b.

Mismatch error when load int4 model about glm-130b HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent