Code Monkey home page Code Monkey logo

Comments (13)

BlackSamorez avatar BlackSamorez commented on May 18, 2024 2

@Vincent131499 @taishiciR Hi!
I was able to reproduce the issue and I've come up with the fix. PEFT LoRA is fully functional in #68 and will be merged into main soon enough. The only problem is that it only works with peft>=0.3.0dev0, which is the latest dev version (so install with pip install git+https://github.com/huggingface/peft.git@main).
You can checkout lora branch and work with LoRA right away.

P.S. The only model not supporting LoRA right now is vanilla gpt2 since it uses convolutions instead of Linear layers and it really breaks everything.

from tensor_parallel.

BlackSamorez avatar BlackSamorez commented on May 18, 2024 1

@taishiciR Hi!
Are you using device_map?
You can create a device map from a meta model like this: device_map = tensor_parallel.infer_sharded_device_map(model) and then pass it to accelerate.load_checkpoint_in_model as device_map=device_map. Without it accelerate doesn't know where to dispatch tensors and produces the following error.

It can also be caused by tied weights but I've only seen it for T5 model. LLaMA should definitely work.

from tensor_parallel.

BlackSamorez avatar BlackSamorez commented on May 18, 2024

@Vincent131499 Hi!
Simply calling tensor_parallel on model with already loaded adapters should just work. Auto configuration will take care of adapters. Exactly what problems have you ran into?
Predefined tensor_parallel configs for transformers models might not work properly but they'll not be triggered on PeftModel anyway. I'll try and come up with an elegant solution for this later.

from tensor_parallel.

sgsdxzy avatar sgsdxzy commented on May 18, 2024

It would be better to support shard-loading base model and lora directly to gpu, if that's possible.

from tensor_parallel.

BlackSamorez avatar BlackSamorez commented on May 18, 2024

@sgsdxzy It's supported.
You can create a tensor_parallel model from a meta model and then dispatch the weights however you like.
For example this demo does it like this:

  1. Initialize a meta model with accelerate.init_empty_weights()
  2. Create a TensorParallelPreTrainedModel from the meta model
  3. Load model shards and convert them with tensor_parallel.convert_state_dict
  4. Dispatch them with accelerate.load_checkpoint_in_model

You could add LoRA to the meta model after step 1 either as real weights or meta weights to be initialized later.

from tensor_parallel.

BlackSamorez avatar BlackSamorez commented on May 18, 2024

I'm not sure how LoRA is natively dispatched but dispatching it by hand should always work.

from tensor_parallel.

taishiciR avatar taishiciR commented on May 18, 2024

hi @BlackSamorez Any plan to support PEFT LoRA models?
For I came acorss the same problem using other tp strategy, when Shard init a model , the lora layer just do not work well in foward step.

Are there any examples for fine-tuning llama with lora?
look forward to your reply ,thanks!

from tensor_parallel.

taishiciR avatar taishiciR commented on May 18, 2024

When I tried to adapt tensorParallism to 【llama with lora】 in https://github.com/tloen/alpaca-lora/blob/main/finetune.py ,
I came across runtime error , in huggingface transformers/trainer.py,
Look in detail there be no attr for a TensorParallelPreTrainedModel?
AttributeError: 'TensorParallelPreTrainedModel' object has no attribute '_is_int8_training_enabled'
hi~ any work round for this issue @BlackSamorez

Besides
https://www.kaggle.com/code/blacksamorez/tensor-parallel-int8-llm -- I had both tried the example above with main/Lallma branch,it just do not work ( change the model to llama-7b-hf + lora )

from tensor_parallel.

sgsdxzy avatar sgsdxzy commented on May 18, 2024

@BlackSamorez @Vincent131499 @taishiciR I think you can just merge the lora before applying TP, that would provide the best performance:

lora_model = PeftModel.from_pretrained(base_model, lora_path)

# merge weights - new merging method from peft, requires > 0.2.0
lora_model = lora_model.merge_and_unload()

APPLY TP HERE

Though you need to load the entire model first, you can load in cpu and let TP dispatch it.

from tensor_parallel.

BlackSamorez avatar BlackSamorez commented on May 18, 2024

@sgsdxzy If we're talking about inference then yes - merging is the way to go. But for adapter training merging is, obviously, not an option.

from tensor_parallel.

taishiciR avatar taishiciR commented on May 18, 2024

@Vincent131499 @taishiciR Hi! I was able to reproduce the issue and I've come up with the fix. PEFT LoRA is fully functional in #68 and will be merged into main soon enough. The only problem is that it only works with peft>=0.3.0dev0, which is the latest dev version (so install with pip install git+https://github.com/huggingface/peft.git@main). You can checkout lora branch and work with LoRA right away.

P.S. The only model not supporting LoRA right now is vanilla gpt2 since it uses convolutions instead of Linear layers and it really breaks everything.
@BlackSamorez hi ~
With the updated dependencies , I tried the example again following the steps
For example this demo does it like this:

1)Initialize a meta model with accelerate.init_empty_weights()
2.1)get_peft_model(model, lora_config) # warpper llama 7b with lora
2.2)Create a TensorParallelPreTrainedModel from the meta model
3)Load model shards and convert them with tensor_parallel.convert_state_dict
4)Dispatch them with accelerate.load_checkpoint_in_model

I came across an error from accelerate/utils/modeling.py:935 (accelerate 0.18.0) : wrapped_model.module_shards.0.model.layers.0.self_attn.q_proj.weight doesn't have any device set.
the device_map of the model turn out to be: 'wrapped_model.module_shards.0.base_model.model.model.layers.0.self_attn.q_proj.tp_wrapped_module.weight': device(type='cuda', index=0)

Can u please check this ?

from tensor_parallel.

taishiciR avatar taishiciR commented on May 18, 2024

thank you~

from tensor_parallel.

BlackSamorez avatar BlackSamorez commented on May 18, 2024

Since there seems to be no more activity, I'm closing this issue.
Feel free to reopen it if anything comes up.

from tensor_parallel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.