jasonvanf / llama-trl Goto Github PK

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

License: Apache License 2.0

Python 100.00%

adapter chatgpt gpt gpt-4 llama lora peft ppo rlhf transformer trl

llama-trl's Introduction

Oh Hi there,

I am Jason, an Algorithm(NLP) Engineer, my research area at graduate level is AI & ML.

🧑‍💼

⚙️ Development Stack
       - AI: Natural Language Processing, Machine Learning, 
       - Programming: Python, C/C++, PHP, SQL, Shell, Javascript
       - Framework: Pytorch, Paddle
☕️ Coffee Geek: Espresso, Pour-over, JP Dark Roasted

📫

llama-trl's People

Contributors

Stargazers

Watchers

llama-trl's Issues

运行 train_llm_with_rl.py 是会在 ppo_trainer.generate(...) 报如下的错误，请教解决

tuning_lm_with_rl 有完整运行成功的案例嘛？

accelerate launch --multi_gpu --num_machines 1 --num_processes 8
tuning_lm_with_rl.py
--log_with wandb
--model_name <LLAMA_FINETUNED_MODEL>
--reward_model_name <LLAMA_RM_MODEL>
--adafactor False
--tokenizer_name <LLAMA_TOKENIZER>
--save_freq 100
--output_max_length 128
--batch_size 8
--gradient_accumulation_steps 8
--batched_gen True
--ppo_epochs 4
--learning_rate 1.4e-5
--early_stopping True
--output_dir './checkpoints/tuning_llama_rl/'

请问 <LLAMA_RM_MODEL> 是指的哪个文件？是「Wenzhong-GPT2-110M_peft_gpt-4-llm_rm_xxx_xx」还是原base模型？求解答

想跟您联系与合作，谢谢。我的微信：xyj15764222030

tuning_lm_with_rl.py does not appear to have a file named config.json

Hi Jason,

I followed the steps
Step 1 - Supervised Fine-tuning, generate "/checkpoints/supervised_llama/" including folders:

checkpoint-2000
checkpoint-3000
checkpoint-4000
final_checkpoint

Step 2 Training Reward Model, generate "/checkpoints/training_reward_model/" including folders:

llama-7b-hf_peft_gpt-4-llm_rm_0_2e-05
peft_last_checkpoint

Step 3 Tuning LM with PPO.

accelerate launch --multi_gpu --num_machines 1  --num_processes 2     tuning_lm_with_rl.py     --log_with wandb     --model_name ./checkpoints/supervised_llama/     --reward_model_name ./checkpoints/training_reward_model/     --adafactor False     --tokenizer_name ./data/model/     --save_freq 100     --output_max_length 128     --batch_size 8     --gradient_accumulation_steps 8     --batched_gen True     --ppo_epochs 4     --learning_rate 1.4e-5     --early_stopping True     --output_dir './checkpoints/tuning_llama_rl/'

But there is an Error:

CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so...
Traceback (most recent call last):
  File "tuning_lm_with_rl.py", line 159, in <module>
    tokenizer = AutoTokenizer.from_pretrained(script_args.model_name)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 657, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/configuration_auto.py", line 916, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 573, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 380, in cached_file
    raise EnvironmentError(
OSError: ./checkpoints/supervised_llama does not appear to have a file named config.json. Checkout 'https://huggingface.co/./checkpoints/supervised_llama/None' for available files.

There is no config.json under supervised_llama or training_reward_model.

IndexError: index out of range in self on training_reward_model

(gh_llama-trl) ub2004@ub2004-B85M-A0:~/llm_dev/llama-trl$ python training_reward_model.py --model_name '/data-ssd-1t/hf_model/llama-7b-hf' --dataset_name './data/comparison_data.json' --output_dir './checkpoints/training_reward_model/'

/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.15.2
wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing.
0%| | 0/11166 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2382: UserWarning: max_length is ignored when padding=True and there is no truncation strategy. To pad to max length, use padding='max_length'.
warnings.warn(
Traceback (most recent call last):
File "/home/ub2004/llm_dev/llama-trl/training_reward_model.py", line 307, in
trainer.train(script_args.resume_from_checkpoint)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/home/ub2004/llm_dev/llama-trl/training_reward_model.py", line 198, in compute_loss
rewards_j = model(input_ids=inputs["input_ids_j"], attention_mask=inputs["attention_mask_j"])[0]
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/peft/peft_model.py", line 566, in forward
return self.base_model(
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 821, in forward
transformer_outputs = self.model(
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 531, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/home/ub2004/anaconda3/envs/gh_llama-trl/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
wandb: Waiting for W&B process to finish... (failed 1).
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /home/ub2004/llm_dev/llama-trl/wandb/offline-run-20230612_091914-tynnd9g5
wandb: Find logs at: ./wandb/offline-run-20230612_091914-tynnd9g5/logs
(gh_llama-trl) ub2004@ub2004-B85M-A0:~/llm_dev/llama-trl$

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.