Code Monkey home page Code Monkey logo

pyllama's Introduction

๐Ÿฆ™ LLaMA - Run LLM in A Single 4GB GPU

๐Ÿ“ข pyllama is a hacked version of LLaMA based on original Facebook's implementation but more convenient to run in a Single consumer grade GPU.

The Hugging Face's LLaMA implementation is available at pyllama.hf.

๐Ÿ“ฅ Installation

In a conda env with pytorch / cuda available, run:

pip install pyllama -U

๐Ÿ If you have installed llama library from other sources, please uninstall the previous llama library and use pip install pyllama -U to install the latest version.

๐Ÿ“ฆ Download Model Files

๐Ÿง˜โ€โ™€๏ธ Official Way

In order to download the checkpoints and tokenizer, fill this google form

Once your request is approved, you will receive links to download the tokenizer and model files. Edit the download.sh script with the signed url provided in the email to download the model weights and tokenizer.

๐Ÿ’ Community Way

    1. pyllama

There is another high-speed way to download the checkpoints and tokenizers. There are four models(7B,13B,30B,65B) available. To download all of them, run:

python -m llama.download

To download only the 7B model files to your current directory, run:

python -m llama.download --model_size 7B

To download only the 7B and 30B model files to folder /tmp/pyllama_data, run:

python -m llama.download --model_size 7B,30B --folder /tmp/pyllama_data

The help doc is:

$python -m llama.download --help
usage: download.py [-h] [--model_size MODEL_SIZE] [--folder FOLDER]

optional arguments:
  -h, --help            show this help message and exit
  --model_size MODEL_SIZE
                        The size of the models that you want to download. A comma separated
                        string of any of "7B", "13B", "30B", "65B". Totally 219G disk space
                        is needed to download them all. If you only want to download the 7B
                        model, just put "7B" here.
  --folder FOLDER       The target folder for the download files
  • Sample Screenshot

    1. Bittorrent

๐Ÿ”ฅ In order to download the checkpoints and tokenizer, use this BitTorrent link: "magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA".

๐Ÿ’Ž Quantize LLaMA to run in a 4GB GPU

pyllama support quantization of 2/3/4/8-bit so that you can run model in a 4G memory GPU.

You need to run export HUGGING_FACE_HUB_TOKEN=XXX to be able to access Hugging Face's data. You also need to install gptq with command pip install gptq.

python -m llama.llama_quant --help
usage: llama_quant.py [-h] [--ckpt_dir CKPT_DIR] [--tokenizer_path TOKENIZER_PATH] 
                      [--seed SEED] [--nsamples NSAMPLES] [--percdamp PERCDAMP]
                      [--nearest] [--wbits {2,3,4,8,16}] [--groupsize GROUPSIZE]
                      [--save SAVE] [--load LOAD] [--benchmark BENCHMARK] [--check]
                      [--cuda CUDA] [--eval]
                      {wikitext2,ptb,c4}

positional arguments:
  {wikitext2,ptb,c4}    Where to extract calibration data from.

optional arguments:
  -h, --help            show this help message and exit
  --ckpt_dir CKPT_DIR
  --tokenizer_path TOKENIZER_PATH
  --seed SEED           Seed for sampling the calibration data.
  --nsamples NSAMPLES   Number of calibration data samples.
  --percdamp PERCDAMP   Percent of the average Hessian diagonal to use for dampening.
  --nearest             Whether to run the RTN baseline.
  --wbits {2,3,4,8}  bits for quantization
  --groupsize GROUPSIZE
                        Groupsize to use for quantization; default uses full row.
  --save SAVE           Save quantized checkpoint under this name, eg pyllama-7B4b.pt.
  --load LOAD           Load quantized model.
  --benchmark BENCHMARK
                        Number of tokens to use for benchmarking.
  --check               Whether to compute perplexity during benchmarking for verification.
  --cuda CUDA           GPU device string, 'cuda:0' by default.
  --eval                Evaluate the model with dataset wikitext2, ptb and c4
  • Quantize 7B model to 8-bit
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt
  • Quantize 7B model to 4-bit with groupsize 128 (the recommended setup ๐Ÿ”ฅ)
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt
  • Quantize 7B model to 2-bit
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt

The download links for quantized LLaMA files are below:

  • 7B
Quant Type Size Link MD5 Loss Password
2-bit 2160484475 ๐Ÿ”— 4c7215d28c1f650218c43fc46402cec5 - 8g9d
3-bit - - - - -
4-bit 3779485819 - cce9a3b522ddf5c011ee0174b2ff3dfb - -
8-bit 7017493231 - 2648b09597cf8f9e0d1a04cb70b71cab - -

It took me 2 hours 40 mins to quantize the 65B model to 4bit. The file size is reduced from 122GB to 32GB.

The following suggestions are recommended for LLM quantization:

  1. By default, use 4-bit quantization for LLM inference as it offers the total model bits and zero-shot accuracy trade-offs.
  2. Use a block size of 128 or lower to stabilize 4-bit quantization and improve zero-shot performance.
  3. Use a floating point or quantile quantization data type. In some cases, integer data types might be preferable to improve inference latency depending on the implementation and hardware support.

๐Ÿ”ฎ Single GPU Inference

๐Ÿฅฅ Without Quantization

Set the environment variables CKPT_DIR as your llama model folder, for example /llama_data/7B, and TOKENIZER_PATH as your tokenizer's path, such as /llama_data/tokenizer.model.

And then run the following command:

python inference.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH

The following is an example of LLaMA running in a 8GB single GPU.

LLaMA Inference

๐Ÿฅ With Quantization

With quantization, you can run LLaMA with a 4GB memory GPU.

  • pyllama can run 7B model with 6GB GPU memory. Example: python quant_infer.py --wbits 4 --load pyllama-7B4b.pt -- text "..." --max_length 24 --cuda cuda:0

4bit-quant-6GB

  • pyllama can run 7B model with 3.2GB GPU memory. Example: python quant_infer.py --wbits 2 --load pyllama-7B4b.pt -- text "..." --max_length 32

2bit-quant-6GB

๐Ÿ’ก Tips

  • To load KV cache in CPU, run export KV_CAHCHE_IN_GPU=0 in the shell.

  • To profile CPU/GPU/Latency, run:

python inference_driver.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH

A sample result is like:

LLaMA Inference

  • Tune max_seq_len and max_batch_size to reduce memory consumption to be able to run in GPU. Refer to: this post!

๐Ÿ‰ Start a gradio webui

$ cd apps/gradio
$ python webapp_single.py  --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH

You should see something like this in your browser:

LLaMA Inference

๐Ÿ“ Start a web server

The following command will start a flask web server:

$ cd apps/flask
$ python web_server_single.py  --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH

๐Ÿ’ Multiple GPU Inference

๐Ÿง˜โ€โ™€๏ธ Official Way

To use the original META's model parallel, please set environment variable PYLLAMA_META_MP like:

export PYLLAMA_META_MP=1

With this environment variable set, you can import llama and the original META version's llama will be imported.

The provided example.py can be run on a single or multi-gpu node with torchrun and will output completions for two pre-defined prompts. Using TARGET_FOLDER as defined in download.sh:

torchrun --nproc_per_node MP example.py --ckpt_dir $TARGET_FOLDER/model_size \
  --tokenizer_path $TARGET_FOLDER/tokenizer.model

Different models require different MP values:

Model MP
7B 1
13B 2
30B 4
65B 8

๐Ÿ’ Community Way

There are two steps to run LLaMA in multi-GPU environment.

  • Convert original LLaMA model
$python -m llama.convert_llama --help
usage: convert_llama.py [-h] [--ckpt_dir CKPT_DIR] [--tokenizer_path TOKENIZER_PATH]
                        [--model_size {7B,13B,30B,65B}] [--output_dir OUTPUT_DIR]
                        [--max_batch_size MAX_BATCH_SIZE] [--to {hf,fb}]

optional arguments:
  -h, --help            show this help message and exit
  --ckpt_dir CKPT_DIR
  --tokenizer_path TOKENIZER_PATH
  --model_size {7B,13B,30B,65B}
  --output_dir OUTPUT_DIR
                        Location to write HF model and tokenizer
  --max_batch_size MAX_BATCH_SIZE
  --to {hf,fb}
  • Run with HF's accelerate with multiple GPUs
$python -m llama.llama_multigpu --help
usage: llama_multigpu.py [-h] [--state_dict_dir STATE_DICT_DIR] [--model_size {7B,13B,30B,65B}]

optional arguments:
  -h, --help            show this help message and exit
  --state_dict_dir STATE_DICT_DIR
  --model_size {7B,13B,30B,65B}

๐Ÿ”ฌ Model Fine Tuning

With Stanford Alpaca Instruction-Following Dataset

  • Tokenization
  • Finetuning
  • Efficient FT

๐Ÿงฌ LLaMA model structure

  • Meta
  • Hugging Face
https://github.com/facebookresearch/llama/blob/main/llama/model.py#LL127C27-L127C27

Model Card

See MODEL_CARD.md

License

See the LICENSE file.

pyllama's People

Contributors

a1ex90 avatar daniel-kukiela avatar george-adams1 avatar gmlove avatar guspan-tanadi avatar jack-moo avatar juncongmoo avatar llimllib avatar mldevorg avatar wanweilove avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyllama's Issues

M1 inference

I can run original llama (with minimal changes) and llama.cpp on my Macbook M1 Max. I think it would be great if I could use pyllama with the same hardware too.

torch.cuda.OutOfMemoryError: CUDA out of memory

Thanks for making this repo! I was looking to run this on my own hardware and this is helping me do just that.

I first tried to run inference with Facebook's own instructions by I was getting a memory error. I tried a few other modifications but they did not work either.

Finally, I came to this repository to try and fix my problem. I'm still getting the same error, however.

Error:

Traceback (most recent call last):
  File "/mnt/FILEZ/Files/Downloads/Media/llama/inference.py", line 67, in <module>
    run(
  File "/mnt/FILEZ/Files/Downloads/Media/llama/inference.py", line 48, in run
    generator = load(ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size)
  File "/mnt/FILEZ/Files/Downloads/Media/llama/inference.py", line 32, in load
    model = Transformer(model_args)
  File "/mnt/FILEZ/Files/Downloads/Media/llama/llama/model_single.py", line 196, in __init__
    self.layers.append(TransformerBlock(layer_id, params))
  File "/mnt/FILEZ/Files/Downloads/Media/llama/llama/model_single.py", line 170, in __init__
    self.feed_forward = FeedForward(
  File "/mnt/FILEZ/Files/Downloads/Media/llama/llama/model_single.py", line 152, in __init__
    self.w2 = nn.Linear(
  File "/home/musa/.local/share/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 11.75 GiB total capacity; 11.50 GiB already allocated; 11.12 MiB free; 11.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have been consistently seeing this error everytime that I've tried to run inference and I'm not sure how to fix it.

I run the inference with this command: python inference.py --ckpt_dir ./llama-dl/7B --tokenizer_path ./llama-dl/tokenizer.model

My Specs:

CPU: Intel 5 11500
GPU: 12GB Nvidia 3060
RAM: 16 GB

With these specs it seems I should be able to run this version of inference but it still does not work.

Before running the program I ran the free command:

               total        used        free      shared  buff/cache   available
Mem:        15173760      668404    12962088         584     1543268    14165152
Swap:       15605752      550436    15055316

So I definitely have more than the 8GB of ram shown in the README.

I would really appreciate your help, thanks!

Error Downloading Models from Community on Winodws

I have cloded the repo, installed all requirements including the CMake and itree based on one of the reported issues, still I run into the following traceback error when trying to download the model via:

python -m llama.download
or
python -m llama.download --folder .\models\

Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: '/tmp/error.njkfo9xztnqw.log'
  File "C:\Users\majmo\Git\pyllama\llama\download.py", line 17, in download
    retcode = hiq.execute_cmd(cmd, verbose=True, shell=True, runtime_output=True)
  File "C:\Users\majmo\Git\pyllama\llama\download.py", line 87, in <module>
    download(args)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/error.njkfo9xztnqw.log'

I have tried to see what I can change, but it was nor clear what hiq does actually when executing at line 17!

Vanilla pytorch LLaMA implementation

Hey great work with pyllama.
I may be wrong but i noticed that your code checks if the system has the same number of GPUs of the checkpoints (like here).
If it is the case it means that you can only run the 65B version if you have 8 GPUs but this is not necessary.

Here you can find a vanilla pytorch implementation of LLaMA and a weights conversion script that you can use to run LLaMA using as many (or as few) GPUs as you want https://github.com/galatolofederico/vanilla-llama

Quantize 7B model to 8-bit --> "Killed"

Getting this issue:

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt
Loading checkpoint shards:  64%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                       | 21/33 [00:11<00:05,  2.32it/s]
Killed

Any ideas? It seems to consistently fail at 64% on that.

Downloading get stuck in infinite loop

When trying to download the models, I get stuck in some infinite loop.

Screenshot 2023-03-27 at 12 17 59

This repeats once every second until I terminate the program.

Environment:

  • pyllama version: commit 321d475f01c88e179c8a30d68b5281e2caca5b07 (HEAD -> main, tag: v0.0.9, origin/main, origin/HEAD)
  • OS: macOS 13.2.1
  • Hardware: Apple M1 Max
  • After installing the following packages were installed manually:
    • Transformers
      • This was missing from requirements.txt
      • Version 4.27.3
      • Command used pip install transformers
    • Itree
      • Was incompatible with M1 architecture
      • Work-around instructions: https://pypi.org/project/py-itree/
      • Command used: pip uninstall py-itree ; pip install https://github.com/juncongmoo/itree/archive/refs/tags/tag-bf9f3aada064acf3ce4db6fc58ed2e744caee0a3.tar.gz

already quantize to 4bit and get the model pyllama-7B4b.pt๏ผŒbut can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

the error is as the follow๏ผš
python webapp_single.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
Traceback (most recent call last):
File "/home/xxxx/chatllama/pyllama/apps/gradio/webapp_single.py", line 80, in
generator = load(
File "/home/u/chatllama/pyllama/apps/gradio/webapp_single.py", line 42, in load
model = Transformer(model_args)
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 199, in init
self.layers.append(TransformerBlock(layer_id, params))
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 167, in init
self.feed_forward = FeedForward(
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 154, in init
self.w3 = nn.Linear(dim, hidden_dim, bias=False)
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated; 0 bytes free; 9.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

du -sh pyllama-7B4b.pt
3.6G pyllama-7B4b.pt

"torch.cuda.OutOfMemoryError: CUDA out of memory" when I'm *not* out of memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 12.00 GiB total capacity; 2.60 GiB already allocated; 8.36 GiB free; 2.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

woaah. I'm out of memory already? 8.36 GiB free and I can't allocate 64.00 MiB?

example.py FAILED

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 261346) of binary: /usr/bin/python3

error when running model for inference: ModuleNotFoundError: No module named 'transformers.models.llama'

Hi,

I am trying to run inference using pyllama using the quantized 4bit model on Google colab, however I get below error, after model is successfully loaded:

(The command to run inference is:
!python pyllama/quant_infer.py --wbits 4 --load drive/MyDrive/pyllama/llama-7b-4bit.pt --text "the general theory of relativity states that" --max_length 24 --cuda cuda:0)

mod,126,transformers.models.llama.tokenization_llama: ModuleNotFoundError: No module named 'transformers.models.llama'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
(2):
/usr/local/lib/python3.8/dist-packages/hiq/base.py(491): _h
/usr/local/lib/python3.8/dist-packages/hiq/base.py(426): enable_hiq
/usr/local/lib/python3.8/dist-packages/hiq/base.py(160): init
/usr/local/lib/python3.8/dist-packages/hiq/base.py(722): init
pyllama/quant_infer.py(6): main
pyllama/quant_infer.py(25):

๐Ÿฆ‰ transformers.models.llama.tokenization_llama.LLaMATokenizer.encode is not traced('NoneType' object has no attribute 'LLaMATokenizer')
โŒ›๏ธ Loading model from drive/MyDrive/pyllama/llama-7b-4bit.pt...
โœ… Model from drive/MyDrive/pyllama/llama-7b-4bit.pt is loaded successfully.
Traceback (most recent call last):
File "pyllama/quant_infer.py", line 25, in
main()
File "pyllama/quant_infer.py", line 19, in main
hiq.mod("llama.llama_infer").run(args)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 375, in __x
s.handle_exception(f_name, e)
File "/usr/local/lib/python3.8/dist-packages/hiq/utils.py", line 493, in __y
r = f(s, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 353, in handle_exception
raise e
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 368, in __x
result = call_decorated(
File "/usr/local/lib/python3.8/dist-packages/hiq/hiq_utils.py", line 326, in call_decorated
return f(*args, **kwargs)
File "", line 27, in __run_quant
File "", line 11, in __run_quant
File "/content/pyllama/llama/llama_infer.py", line 75, in run
tokenizer = AutoTokenizer.from_pretrained(args.model)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 375, in __x
s.handle_exception(f_name, e)
File "/usr/local/lib/python3.8/dist-packages/hiq/utils.py", line 493, in __y
r = f(s, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 353, in handle_exception
raise e
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 368, in __x
result = call_decorated(
File "/usr/local/lib/python3.8/dist-packages/hiq/hiq_utils.py", line 326, in call_decorated
return f(*args, **kwargs)
File "", line 27, in __from_pretrained
File "", line 11, in __from_pretrained
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

No module named "transformers" error

When i try to run "python -m llama.download --model_size 7B", it says that python command doesnt exist, so i have to use "python3" command, but once i write "python3 -m llama.download --model_size 7B", all these errors appears
image
Can someone help me figure out what is wrong?

Unkown cuda error

Traceback (most recent call last):
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/inference.py", line 82, in <module>
    run(
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/inference.py", line 50, in run
    generator = load(
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/inference.py", line 33, in load
    model = Transformer(model_args)
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/llama/model_single.py", line 195, in __init__
    self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
  File "/home/orion/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 142, in __init__
    self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
  File "/home/orion/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
python3 pyllama/inference.py --ckpt_dir models/7B/ --tokenizer_path models/tokenizer.model

Environment is ubuntu 22, cuda 12.1, rtx 3060 ti

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 33/33 [00:12<00:00, 2.68it/s]
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjjj/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 655, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
This error might be caused by the fact that LLaMATokenizer was changed to LlamaTokenizer. Where should I make the modification?

Share your evaluate result

We evaluate llama using 100 examples of the SQuAD dataset with the Open-evals framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt as the output of Llama and useinclude accuracy as a metric to measure its performance.

For a model completion a and a reference list of correct answers B
include: any([(a in b) for b in B])

model squad(100)
alpaca-lora-7b 0.88
llama-7b 0.63
gpt-3.5-turbo 0.9
text-davinci-003 0.87
text-davinci-002 0.66
text-davinci-001 0.58
ada 0.35

Quantization with "groupsize" makes the results completely wrong.

Hi,

I'm quantizing the models following the README but there's one common thing while using the groupsize parameter - in each case the perplexity goes to the roof and the results are completely wrong.
For example, quantizing 7B model with 4 bits, perplexity:

wikitext2: 7.462815284729004
ptb:       11.122198104858398
c4:        8.211784362792969

And the same model with 4 bits and --groupsize 128:

wikitext2: 243848.546875
ptb:       309488.53125
c4:        240030.015625

And the results for input What's the Earth?:

  • 4b:
๐Ÿฆ™: What's the Earth?
So what's the earth? It's a planet.
Which one? Well, the one that revolves around the sun.
Now that's true, but what does that mean?
  • 4b, group size of 128:
๐Ÿฆ™: What's the Earth?รถrtfitolly Alburd Tob fitpaunity Tobลผyurd girlsurd fitattanattan๏ฟฝรถrt SE๏ฟฝลผy girlsolly Podpois Siegunityunityollyลบ๏ฟฝรฉliollyรถrt Nationpois Pod girls finalepoisazineattan

Any idea what's going on?

If this matters, I'm using Python 3.8 in ubuntu 22.04 running in WSL

pyllama/downloads returns empty folders

Hello, when running:

python3 -m llama.download

the command runs almost instantly but only creates empty folders named 7B, 13B, etc...
I also tried by specifying --model-size and --folder with the same result

Download takes forever

Stuck at "downloading file to llama_7B/7B/consolidated.00.pth" for several hours. I checked the size of the model folder, it's around 6.6GB. The size stays constant.

The following are script outputs.

โค๏ธ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading
Downloading tokenizer...
โœ… llama_7B/tokenizer.model
โœ… llama_7B/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to llama_7B/7B/consolidated.00.pth ...please wait for a few minutes ...

Running web_server.py on Multi GPU instance.

Hello. I started 8x A100 80G instance in Google Cloud and can't start 65B model:

root@llama:/pyllama/apps/flask# python3 web_server.py --ckpt_dir /var/llama/65B --tokenizer_path /var/llama/tokenizer.model
Traceback (most recent call last):
  File "/pyllama/apps/flask/web_server.py", line 101, in <module>
    generator = init_generator(
  File "/pyllama/apps/flask/web_server.py", line 88, in init_generator
    local_rank, world_size = setup_model_parallel()
  File "/pyllama/apps/flask/web_server.py", line 39, in setup_model_parallel
    dist.init_process_group("nccl")
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 754, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/rendezvous.py", line 236, in _env_rendezvous_handler
    rank = int(_get_env_or_raise("RANK"))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/rendezvous.py", line 221, in _get_env_or_raise
    raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

ModuleNotFoundError: No module named 'quant_cuda'

I got this error when running " !python3 -m llama.llama_quant --help " on Google Colab

Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.9/dist-packages/llama/llama_quant.py", line 6, in
from gptq import (
File "/usr/local/lib/python3.9/dist-packages/gptq/init.py", line 9, in
from .gptq import GPTQ
File "/usr/local/lib/python3.9/dist-packages/gptq/gptq.py", line 5, in
from .quant import quantize
File "/usr/local/lib/python3.9/dist-packages/gptq/quant.py", line 4, in
from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16
ModuleNotFoundError: No module named 'quant_cuda'

Sorry,I can't run

(llama) -bash-4.2$ python inference.py --ckpt_dir ./models/7B --tokenizer_path ./models/tokenizer.model
Traceback (most recent call last):
File "/home/ycshu_wlxy/kingingwang/pyllama-main/inference.py", line 67, in
run(
File "/home/ycshu_wlxy/kingingwang/pyllama-main/inference.py", line 47, in run
generator = load(ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size)
File "/home/ycshu_wlxy/kingingwang/pyllama-main/inference.py", line 22, in load
checkpoint = torch.load(ckpt_path, map_location="cpu")
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 789, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 1131, in _load
result = unpickler.load()
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 1101, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 1079, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
RuntimeError: PytorchStreamReader failed reading file data/22: invalid header or archive is corrupted

Struggle with training LLaMA with a single GPU using both PT v1 and v2

Hi,
I love your code base and want to try how to train the LLaMA with a single GPU. This code I use is here https://github.com/juncongmoo/pyllama/blob/main/llama/model_single.py.
However, I struggle with an error. This message's shown that:
"
self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
File "/home/linh/anaconda3/envs/a/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 139, in init
self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs))
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 512]
"
Can you help me to fix/test this code again.

Thank in advance.
Linh

Model mismatch for 13B

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 webapp.py --ckpt_dir ../../../llama/ckpt/13B/ --tokenizer_path ../../../llama/ckpt/tokenizer.model
`WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


initializing model parallel with size 2
initializing ddp with size 1
initializing pipeline with size 1
Traceback (most recent call last):
File "webapp.py", line 95, in
generator = load(
File "webapp.py", line 56, in load
model.load_state_dict(checkpoint, strict=False)
File "/data/anaconda3/envs/pyllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Transformer:
size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([32000, 2560]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).
size mismatch for layers.0.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.0.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.0.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.1.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.1.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.1.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.2.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.2.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.2.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.3.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.3.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.3.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.4.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.4.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.4.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.5.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.5.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.5.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.6.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.6.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.6.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.7.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.7.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.7.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.8.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.8.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.8.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.9.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.9.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.9.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.10.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.10.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.10.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.11.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.11.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.11.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.12.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.12.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.12.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.13.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.13.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.13.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.14.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.14.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.14.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.15.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.15.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.15.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.16.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.16.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.16.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.17.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.17.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.17.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.18.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.18.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.18.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.19.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.19.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.19.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.20.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.20.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.20.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.21.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.21.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.21.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.22.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.22.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.22.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.23.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.23.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.23.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.24.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.24.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.24.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.25.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.25.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.25.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.26.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.26.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.26.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.27.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.27.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.27.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.28.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.28.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.28.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.29.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.29.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.29.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.30.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.30.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.30.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.31.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.31.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.31.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.32.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.32.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.32.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.33.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.33.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.33.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.34.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.34.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.34.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.35.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.35.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.35.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.36.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.36.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.36.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.37.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.37.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.37.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.38.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.38.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.38.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.39.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.39.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.39.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for output.weight: copying a param with shape torch.Size([16000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).`

hiq-python installation problem

My process

  • anaconda powershell

  • python 3.10

  • pip install pyllama

  • git clone https://github.com/juncongmoo/pyllama.git

  • cd pyllama

  • run inference => failure

  • pip install -r requirements.txt

  • get the following error
    image

    Using cached hiq_python-1.0.0-py3-none-any.whl (49 kB)
    ERROR: Cannot install hiq-python==1.0.0, hiq-python==1.0.1, hiq-python==1.0.2, hiq-python==1.0.3, hiq-python==1.0.4, hiq-python==1.0.5, hiq-python==1.1.0, hiq-python==1.1.1, hiq-python==1.1.2, hiq-python==1.1.3, hiq-python==1.1.4, hiq-python==1.1.5, hiq-python==1.1.6, hiq-python==1.1.7 and hiq-python==1.1.8 because these package versions have conflicting dependencies.

The conflict is caused by:
hiq-python 1.1.8 depends on py-itree
hiq-python 1.1.7 depends on py-itree
hiq-python 1.1.6 depends on py-itree
hiq-python 1.1.5 depends on py-itree
hiq-python 1.1.4 depends on py-itree
hiq-python 1.1.3 depends on py-itree
hiq-python 1.1.2 depends on py-itree
hiq-python 1.1.1 depends on py-itree~=0.0.15
hiq-python 1.1.0 depends on py-itree~=0.0.15
hiq-python 1.0.5 depends on py-itree~=0.0.15
hiq-python 1.0.4 depends on py-itree~=0.0.15
hiq-python 1.0.3 depends on py-itree~=0.0.15
hiq-python 1.0.2 depends on py-itree~=0.0.14
hiq-python 1.0.1 depends on py-itree~=0.0.14
hiq-python 1.0.0 depends on py-itree~=0.0.14

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Things I've tried

  • there are no listed version requirements in requirement.txt to loosen
  • uninstall, reinstall pyllama, hiq
  • uninstall, reinstall torch
  • pip install py-itree => "ERROR: No matching distribution found for py-itree"

My thought now is that it's because python 3.10 is too far in the future?

Next steps: recreate environment with python 3.8, which I've seen referenced around, and try again.

Error trying Quantize 7B model to 8-bit

when run :
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt
got error:
OSError: Unable to load weights from pytorch checkpoint file for '/home/jima/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/pytorch_model-00002-of-00033.bin' at '/home/jima/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/pytorch_model-00002-of-00033.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Model does not split for 65B

I have 8 80G A100 GPUs. I can't run correctly for the project๏ผŒ while I can run official example.py.

 torchrun --nproc_per_node 8 webapp.py --ckpt_dir /nvme/syx/llama/model/65B/65B/ --tokenizer_path /nvme/syx/ll
ama/model/tokenizer.model

Output:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 79.20 GiB total capacity; 77.97 GiB already allocated; 297.25 MiB free; 77.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

AttributeError: module 'numpy' has no attribute 'array'

(tf) C:\Users\James>python -m llama.download
Traceback (most recent call last):
File "C:\Users\James\anaconda3\envs\tf\lib\runpy.py", line 188, in _run_module_as_main
mod_name, mod_spec, code = get_module_details(mod_name, Error)
File "C:\Users\James\anaconda3\envs\tf\lib\runpy.py", line 111, in get_module_details
import(pkg_name)
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\llama_init
.py", line 1, in
from .generation import LLaMA
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\llama\generation.py", line 6, in
import torch
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\torch_init
.py", line 831, in
from .functional import * # noqa: F403
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\torch\functional.py", line 7, in
import torch.backends.opt_einsum as opt_einsum
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\torch\backends\opt_einsum_init.py", line 9, in
import opt_einsum as opt_einsum # type: ignore[import]
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\opt_einsum_init
.py", line 5, in
from . import blas
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\opt_einsum\blas.py", line 7, in
from . import helpers
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\opt_einsum\helpers.py", line 14, in
_sizes = np.array([2, 3, 4, 5, 4, 3, 2, 6, 5, 4, 3, 2, 5, 7, 4, 3, 2, 3, 4])
AttributeError: module 'numpy' has no attribute 'array'

Meaningless Prediction in 13B 2bit

I have quantized the 13B model to 2bit by executing:

python -m llama.llama_quant decapoda-research/llama-13b-hf c4 --wbits 2 --save pyllama-13B2b.pt

After the quantization when I run the test inference the output seams completely random:

python quant_infer.py --model decapoda-research/llama-13b-hf --wbits 2 --load ../pyllama-13B2b.pt --text "the meaning of life is" --max_length 24 --cuda cuda:0

Screenshot from 2023-03-24 22-44-32

"KeyError: 'llama'"

When debugging the code for the quota section, I received an error message saying "KeyError: 'llama'", and upgrading the transformer did not work.

    dataloader, testloader = get_loaders(
        args.dataset, # C4
        nsamples=args.nsamples,
        seed=args.seed,
        # model=args.model,
        model='D:\\SPACE_Research_AI\\QutaModel_TransformerBased\\modelCk\\models--decapoda-research--llama-7b-hf\\'
              'snapshots\\5f98eefcc80e437ef68d457ad7bf167c2c6a1348',
        seqlen=model.seqlen,
    )

Killed

Hello all, I installed the requirements of project but when I try to execute the following command:

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt

I got this message -> "Killed". Could you help me to determinate better the issue and fix. thanks

Quantize Original LLaMA Model Files

A bit confused here. In README.md, users are asked to donwload LLaMA model files first. Then quantize examples use decapoda-research/llama-7b-hf. How to quantize the downloaded LLaMA model files(for example, consolidated.00.pth for 7B)?

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

run inference.py and it report 'model parallel group is not initialized' error

I have set torch.distributed.init_process_group and it still got this error:

Traceback (most recent call last):
  File "inference.py", line 67, in <module>
    run(
  File "inference.py", line 48, in run
    generator = load(ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size)
  File "inference.py", line 32, in load
    model = Transformer(model_args)
  File "/usr/local/lib/python3.8/dist-packages/llama/model.py", line 205, in __init__
    self.tok_embeddings = ParallelEmbedding(
  File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/model_parallel/layers.py", line 186, in __init__
    world_size = get_model_parallel_world_size()
  File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/model_parallel/initialize.py", line 152, in get_model_parallel_world_size
    return torch.distributed.get_world_size(group=get_model_parallel_group())
  File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/model_parallel/initialize.py", line 128, in get_model_parallel_group
    assert _MODEL_PARALLEL_GROUP is not None, "model parallel group is not initialized"
AssertionError: model parallel group is not initialized

how to solve this? many thx.

ModuleNotFoundError: No module named 'quant_cuda'

Traceback (most recent call last):
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/llama/llama_quant.py", line 6, in <module>
    from gptq import (
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/__init__.py", line 9, in <module>
    from .gptq import GPTQ
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/gptq.py", line 5, in <module>
    from .quant import quantize
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/quant.py", line 4, in <module>
    from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16
ModuleNotFoundError: No module named 'quant_cuda'

I can't find it whatsoever online, no idea whats going on:

$ nvidia-smi
Sat Mar 18 19:49:11 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti      On | 00000000:09:00.0 Off |                  N/A |
|  0%   45C    P8               23W / 200W|     64MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1245      G   /usr/lib/xorg/Xorg                           56MiB |
|    0   N/A  N/A      1437      G   /usr/bin/gnome-shell                          6MiB |
+---------------------------------------------------------------------------------------+

Error when download models

Hi,

Here is the errors when i try download from my mac M1:

python3 -m llama.download        
Traceback (most recent call last):
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py", line 5, in <module>
    from . import _itree
ImportError: cannot import name '_itree' from partially initialized module 'itree' (most likely due to a circular import) (/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/llama/__init__.py", line 4, in <module>
    from .model_single import ModelArgs, Transformer
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/llama/model_single.py", line 8, in <module>
    import hiq
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/hiq/__init__.py", line 57, in <module>
    from .tree import (
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/hiq/tree.py", line 9, in <module>
    import itree
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py", line 7, in <module>
    import _itree
ImportError: dlopen(/Users/paulo/Library/Python/3.9/lib/python/site-packages/_itree.cpython-39-darwin.so, 0x0002): tried: '/Users/paulo/Library/Python/3.9/lib/python/site-packages/_itree.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

ModuleNotFoundError: No module named 'llama.hf'

Try to run:
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt

Got an error:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/transformers/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/miniconda3/envs/transformers/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/miniconda3/envs/transformers/lib/python3.10/site-packages/llama/llama_quant.py", line 16, in
from llama.hf.modeling_llama import LLaMAForC
ModuleNotFoundError: No module named 'llama.hf'

Error trying Quantize 7B model to 2-bit

I have installed GPTQ as said "https://pypi.org/project/gptq/#description", but following error comes out after execute python -m llama.llama_quant D:\Repo\Llama\weights\7B c4 --wbits 2 --save pyllama-7B2b.pt:
Traceback (most recent call last): File "C:\Users\ASUS\.conda\envs\PyLlama\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ASUS\.conda\envs\PyLlama\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Repo\PyLlama\pyllama\llama\llama_quant.py", line 6, in <module> from gptq import ( File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\__init__.py", line 9, in <module> from .gptq import GPTQ File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\gptq.py", line 5, in <module> from .quant import quantize File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\quant.py", line 4, in <module> from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16 ModuleNotFoundError: No module named 'quant_cuda'
I am using Windows 11 SO

No module named 'hiq'

G:\ai\pyllama>python inference.py --ckpt_dir G:\model\7B --tokenizer_path G:\model/tokenizer.model
Traceback (most recent call last):
File "G:\ai\pyllama\inference.py", line 6, in
from llama import ModelArgs, Transformer, Tokenizer, LLaMA
File "G:\ai\pyllama\llama_init_.py", line 5, in
from .model_single import ModelArgs, Transformer
File "G:\ai\pyllama\llama\model_single.py", line 8, in
import hiq
ModuleNotFoundError: No module named 'hiq'

world size assertionerror

I try to make the 7B model on my single GPU server, and I have error:

Traceback (most recent call last):
  File "inference.py", line 82, in <module>
    run(
  File "inference.py", line 50, in run
    generator = load(
  File "inference.py", line 17, in load
    assert world_size == len(
AssertionError: Loading a checkpoint for MP=0 but world size is 1

I used the community way to download the model files.

Where can i modify the MP setting? or I have to run it with multiple GPU way?

How can I input prompt when I use multi GPU?

Hello, I use 4 V100 GPU, and I load the 30B model, I want to modify the example.py code to input my promths. But it doesnot work. My code as this:

user_input = input("please enter your prompts (Ctrl+C to exit): ")
prompts = [user_input]
print("prompts", prompts)

It stops before the print code. How to solve it ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.