After fine-tuning llama with lora, how to load through multi-gpu? Are there any ex

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

When I tried to adapt tensorParallism to 【llama with lora】 in <a href="https://github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to load lora weights？ about tensor_parallel HOT 13 CLOSED

blacksamorez commented on May 18, 2024

How to load lora weights？

from tensor_parallel.

Comments (13)

BlackSamorez commented on May 18, 2024 2

@Vincent131499 @taishiciR Hi!
I was able to reproduce the issue and I've come up with the fix. PEFT LoRA is fully functional in #68 and will be merged into main soon enough. The only problem is that it only works with peft>=0.3.0dev0, which is the latest dev version (so install with pip install git+https://github.com/huggingface/peft.git@main).
You can checkout lora branch and work with LoRA right away.

P.S. The only model not supporting LoRA right now is vanilla gpt2 since it uses convolutions instead of Linear layers and it really breaks everything.

from tensor_parallel.

BlackSamorez commented on May 18, 2024 1

@taishiciR Hi!
Are you using device_map?
You can create a device map from a meta model like this: device_map = tensor_parallel.infer_sharded_device_map(model) and then pass it to accelerate.load_checkpoint_in_model as device_map=device_map. Without it accelerate doesn't know where to dispatch tensors and produces the following error.

It can also be caused by tied weights but I've only seen it for T5 model. LLaMA should definitely work.

from tensor_parallel.

BlackSamorez commented on May 18, 2024

@Vincent131499 Hi!
Simply calling tensor_parallel on model with already loaded adapters should just work. Auto configuration will take care of adapters. Exactly what problems have you ran into?
Predefined tensor_parallel configs for transformers models might not work properly but they'll not be triggered on PeftModel anyway. I'll try and come up with an elegant solution for this later.

from tensor_parallel.

sgsdxzy commented on May 18, 2024

It would be better to support shard-loading base model and lora directly to gpu, if that's possible.

from tensor_parallel.

BlackSamorez commented on May 18, 2024

@sgsdxzy It's supported.
You can create a tensor_parallel model from a meta model and then dispatch the weights however you like.
For example this demo does it like this:

Initialize a meta model with accelerate.init_empty_weights()
Create a TensorParallelPreTrainedModel from the meta model
Load model shards and convert them with tensor_parallel.convert_state_dict
Dispatch them with accelerate.load_checkpoint_in_model

You could add LoRA to the meta model after step 1 either as real weights or meta weights to be initialized later.

from tensor_parallel.

BlackSamorez commented on May 18, 2024

I'm not sure how LoRA is natively dispatched but dispatching it by hand should always work.

from tensor_parallel.

taishiciR commented on May 18, 2024

hi @BlackSamorez Any plan to support PEFT LoRA models?
For I came acorss the same problem using other tp strategy, when Shard init a model , the lora layer just do not work well in foward step.

Are there any examples for fine-tuning llama with lora?
look forward to your reply ,thanks!

from tensor_parallel.

taishiciR commented on May 18, 2024

When I tried to adapt tensorParallism to 【llama with lora】 in https://github.com/tloen/alpaca-lora/blob/main/finetune.py ,
I came across runtime error , in huggingface transformers/trainer.py,
Look in detail there be no attr for a TensorParallelPreTrainedModel?
AttributeError: 'TensorParallelPreTrainedModel' object has no attribute '_is_int8_training_enabled'
hi~ any work round for this issue @BlackSamorez

Besides
https://www.kaggle.com/code/blacksamorez/tensor-parallel-int8-llm -- I had both tried the example above with main/Lallma branch,it just do not work ( change the model to llama-7b-hf + lora )

from tensor_parallel.

sgsdxzy commented on May 18, 2024

@BlackSamorez @Vincent131499 @taishiciR I think you can just merge the lora before applying TP, that would provide the best performance:

lora_model = PeftModel.from_pretrained(base_model, lora_path)

# merge weights - new merging method from peft, requires > 0.2.0
lora_model = lora_model.merge_and_unload()

APPLY TP HERE

Though you need to load the entire model first, you can load in cpu and let TP dispatch it.

from tensor_parallel.

BlackSamorez commented on May 18, 2024

@sgsdxzy If we're talking about inference then yes - merging is the way to go. But for adapter training merging is, obviously, not an option.

from tensor_parallel.

taishiciR commented on May 18, 2024

@Vincent131499 @taishiciR Hi! I was able to reproduce the issue and I've come up with the fix. PEFT LoRA is fully functional in #68 and will be merged into main soon enough. The only problem is that it only works with peft>=0.3.0dev0, which is the latest dev version (so install with pip install git+https://github.com/huggingface/peft.git@main). You can checkout lora branch and work with LoRA right away.

P.S. The only model not supporting LoRA right now is vanilla gpt2 since it uses convolutions instead of Linear layers and it really breaks everything.
@BlackSamorez hi ~
With the updated dependencies , I tried the example again following the steps
For example this demo does it like this:

1）Initialize a meta model with accelerate.init_empty_weights()
2.1）get_peft_model(model, lora_config) # warpper llama 7b with lora
2.2）Create a TensorParallelPreTrainedModel from the meta model
3)Load model shards and convert them with tensor_parallel.convert_state_dict
4)Dispatch them with accelerate.load_checkpoint_in_model

I came across an error from accelerate/utils/modeling.py:935 (accelerate 0.18.0) : wrapped_model.module_shards.0.model.layers.0.self_attn.q_proj.weight doesn't have any device set.
the device_map of the model turn out to be: 'wrapped_model.module_shards.0.base_model.model.model.layers.0.self_attn.q_proj.tp_wrapped_module.weight': device(type='cuda', index=0)

Can u please check this ?

from tensor_parallel.

taishiciR commented on May 18, 2024

thank you~

from tensor_parallel.

BlackSamorez commented on May 18, 2024

Since there seems to be no more activity, I'm closing this issue.
Feel free to reopen it if anything comes up.

from tensor_parallel.

How to load lora weights？ about tensor_parallel HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent