Code Monkey home page Code Monkey logo

slowllama's Issues

Fine-tune other models

Hello,

Can we apply this method to fine-tune models other than llamas and codellama, such as mistral 7b?

Many thanks in advance!

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

In order to merge LoRA checkpoint for llama 2 7B model, I run python merge_lora.py.

But an error occured,

Traceback (most recent call last):
  File "/Users/xxx/llama/slowllama/merge_lora.py", line 14, in <module>
    add_lora(model_path, lora_path, out_model_path)
  File "/Users/xxx/llama/slowllama/loader.py", line 188, in add_lora
    lora = lora_weights[b_key].mm(lora_weights[a_key]) * lora_scale
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

So I modified the code like below and I got the merged model file.

lora = lora_weights[b_key].to(torch.float32).mm(lora_weights[a_key].to(torch.float32)) * lora_scale

But I wonder it's okay or not.
Can you give the opinion or right solution?

RuntimeError: The size of tensor a (2560) must match the size of tensor b (5120) at non-singleton dimension 0

The tensor size mismatch error occurred when I tried to merge the LoRA checkpoint for the llama 2 13B model.

Traceback (most recent call last):
  File "/Users/xxx/llama/slowllama-13b/merge_lora.py", line 14, in <module>
    add_lora(model_path, lora_path, out_model_path)
  File "/Users/xxx/llama/slowllama-13b/loader.py", line 190, in add_lora
    checkpoint[checkpoint_key] = checkpoint[checkpoint_key] + lora[subset].to(torch.bfloat16)
RuntimeError: The size of tensor a (2560) must match the size of tensor b (5120) at non-singleton dimension 0

Llama 2 13B has the two consolidated checkpoints, so do we need to consider the shard when we merge the weights?

finetune.py segmentation fault

I am trying to run the finetune.py and getting a seg. fault. Can anyone help. I am on Apple M2 mac mini with 24G memory.

% python finetune.py 
loc("mps_transpose"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":206:0)): error: 'anec.transpose' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
loc("mps_matmul"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":39:0)): error: 'anec.matmul' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
zsh: segmentation fault  python finetune.py

Fine-tuning codellama dataset

Is there a particular dataset format required for finetuning codellama? I have the dataset in the OpenAI suggested format which is basically a jsonl with each entry having messages: [{role: 'system', content: ''}, {role: 'user', content: ''}, {role: 'assistant', content: ''}]} object. Will this format work?

run prepare_model.py error

when I use CodeLlama-7b to run prepare_model.py,An exception occurred

RuntimeError: The expanded size of the tensor (32000) must match the existing size (32016) at non-singleton dimension 0. Target sizes: [32000, 4096]. Tensor sizes: [32016, 4096]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.