openllmai / openrlhf Goto Github PK

An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)

Home Page: https://huggingface.co/OpenLLMAI

License: Apache License 2.0

Python 99.17% Shell 0.57% Dockerfile 0.26%

deepspeed transformers vllm large-language-models raylib reinforcement-learning-from-human-feedback reinforcement-learning

openrlhf's People

Contributors

Stargazers

Watchers

Forkers

ranchizhao pikaqqqqqq jovany-wang wul8 dabney777 yiyi-philosophy leeeizhang wawpaopao hkksimple john-ge hex-plex vihoix3 suc16 missflash hhhh12345678 zhangxy-2019 theartificialoutsider wuxibin89 stat-eklee vincezengqiang ftgreat robertalanm damonyangyang dupenf plushpluto mylonasc yonggucheng wangsiwei2010 niuboboo boaz968 seekpoint tmsagarofficial vivekguruduttk28 vyksi vanesh37 shlee007 taoleitian xffxff wangxidong06 eltociear yzs-lab lihuibng callanwu chizhu tsaoyu li-plus yhna940 huihuitong tianbingsz edisonchenn yinchuanwang weedge liang-zx ziyiliubird fridayl zhanghaoie panandy xin-li-67 sundogs8603 yuanmeng1120 wuxiaobo murugesanraju rbao2018 donnyyou dylancer1998 kajyuuen atqarana haicaihi forex24 karthik-nexusflow thecats-jfm maic999 wmuog syzong zuonet1988 jamestiotio leejodie zyb5086zyb mgerstgrasser luobintianya wangguojim ray-ng jackeylove1 raogj jacobthebanana dshnightmare kfertakis jenningsje stwaynexg victorshawfan ridiculouz stephen-nju hyunwoongko cdm114514 michaelcola zhangpengbo brunoscaglione trenorenos

openrlhf's Issues

Support checkpoint to prevent training from collapse

Support running on Ray as distributed RLHF framework.

Inquiry regarding the feasibility of fine-tuning LLaMA2-7B with a single A100

Hi team,
Great work, but I have a question to consult.
I used --adam_offload option in https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_sft_llama.sh which mentioned in your blog on Zhihu(https://zhuanlan.zhihu.com/p/650758507) to make it possible to fine-tuning 7B model with a single A100(80G).
However, upon implementing this option, I encountered difficulties as the script seemed to get stuck.
Could you provide more details about it and recommended practices for fine-tuning LLaMA2-7B with a single A100(80G):)

Scale rlhf to 100B models

Do you have a plan for applying Reinforced Self-Training (ReST)?

Thansks for sharing a great project!

Do you have any plans to apply the recently published Reinforced Self-Training (ReST)?

Reinforced Self-Training (ReST) for Language Modeling
https://arxiv.org/abs/2308.08998

Performance Test: OpenLLaMA2 vs others

有几个问题

为什么使用.pt文件保存模型，而不是直接save_pretrained保存成hf格式？如果我需要将其他框架训练得到的hf格式的sft模型用于rm和ppo的话，我只能自己修改代码支持hf格式ckpt的读取。
prompt template可以多定义几个，比如常见的alpaca、vicuna和llama2格式。
rm和ppo的日志输出有点看不明白，比如rm的acc在哪，ppo的mean reward是哪个量，例如：

Train epoch [1/1]: 100%|█| 146/146 [09:37<00:00,  3.95s/it, pg=0.511, cri=0.0207, vals=0
{'pg': -0.007849020959988032, 'cri': 0.009956638121416103, 'vals': 0.28283763604481027, 'kl': -0.0015669609786402971, 'rm': 0.380661175522494, 'ret': 0.3954394154046496, 'glen': 942.7006952991225, 'tlen': 1071.2466288527398, 'k_coef': 0.01}

这里面的vals，rm和ret分别代表什么含义？

AssertionError: backward pass is invalid for module in evaluation mode

``Hi, thank you for making this repo!

An error occurred while training the PPO model. Polyglot 15.8b was used as the SFT model and polyglot5.8B was used as the reward model. I changed the model part by performing quantization and adding lora, but an error occurred regarding the backward pass.

Does anyone know how to solve this?

The contents below are the part where the error occurred, the part where the SFT and RM model were changed, and the part changed in the learning code.

thank you

Train epoch [1/1]:   0%|                                                                                                                                                                              | 0/1 [00:18<?, ?it/s]
Episode [1/1]:   0%|                                                                                                                                                                               | 0/2367 [12:56<?, ?it/s]
Traceback (most recent call last):
  File "ppo_test.py", line 244, in <module>
    trainer.fit(
  File "/raid2/baekig/OpenLLaMA2/openllama2/trainer/ppo_trainer.py", line 184, in fit
    status = self.ppo_train()
  File "/raid2/baekig/OpenLLaMA2/openllama2/trainer/ppo_trainer.py", line 223, in ppo_train
    status = self.training_step(experience)
  File "/raid2/baekig/OpenLLaMA2/openllama2/trainer/ppo_trainer.py", line 304, in training_step
    self.strategy.backward(critic_loss, self.critic, self.critic_optim)
  File "/raid2/baekig/OpenLLaMA2/openllama2/utils/deepspeed.py", line 97, in backward
    model.backward(loss)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1890, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/runtime/zero/stage3.py", line 2029, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 169, in backward
    ctx.pre_backward_function(ctx.module)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 436, in _run_before_backward_function
    self.pre_sub_module_backward_function(sub_module)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/raid2/baekig/anaconda3/envs/nlp/lib/python3.8/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 512, in pre_sub_module_backward_function
    assert sub_module.training, "backward pass is invalid for module in evaluation mode"

strategy = get_strategy(args)
...
sft_id ='EleutherAI/polyglot-ko-12.8b' 
rm_id = 'EleutherAI/polyglot-ko-5.8b'
actor = Actor(sft_id, bnbconfig)  # SFT
critic = Critic(rm_id, bnbconfig)  # RM
reward_model = RewardModel(rm_id, bnbconfig)  # RM

actor.gradient_checkpointing_enable()
critic.gradient_checkpointing_enable()
actor.model.config.use_cache = False
reward_model.model.config.use_cache = False

initial_model = deepcopy(actor)  # SFT
critic.model = deepcopy(reward_model.model)
critic.value_head = deepcopy(reward_model.value_head)
critic.mean = deepcopy(reward_model.mean)
critic.std = deepcopy(reward_model.std)
initial_model.gradient_checkpointing_enable()
reward_model.gradient_checkpointing_enable()
critic.model.config.use_cache = False
initial_model.model.config.use_cache = False

actor.train()
critic.train()
initial_model.train()
reward_model.train()
...
tokenizer = get_tokenizer(args.pretrain, actor.model, "left", strategy)
get_tokenizer(args.critic_pretrain, critic.model, "left", strategy)
get_tokenizer(args.critic_pretrain, reward_model.model, "left", strategy)

dataset = PromptDataset(data, strategy) 
prompts_dataloader = strategy.setup_dataloader(dataset, 
                args.micro_rollout_batch_size, True, True)

actor_optim = strategy.create_optimizer(
    actor, lr=args.actor_learning_rate, betas=(0.9, 0.95), weight_decay=args.l2
...
)

class Actor(nn.Module):
    """
    Actor model base class.

    Args:
        model (nn.Module): Actor Model.
        lora_rank (int): LoRA rank.
        lora_train_bias (str): LoRA bias training mode.
    """

    def __init__(
        self,
        pretrain_or_model,
        bnbconfig,
        from_config=False,
        lora_rank: int = 0,
        lora_train_bias: str = "none",
    ) -> None:
        super().__init__()

        self.model = AutoModelForCausalLM.from_pretrained(
            pretrain_or_model, torch_dtype=torch.bfloat16, 
            quantization_config = bnbconfig,
            trust_remote_code=True,
            device_map = {"":0}
        )
        self.model = PeftModel.from_pretrained(self.model, 'ingeol/sft_adapter', is_trainable=True)

class Critic(nn.Module):
# ...This part is the same as the SFT code, except for the adapter.

class RewardModel(nn.Module):
# ... This part is the same as the SFT code, except for the adapter.

Thank you for creating a great repo.

Support Decision Transformer

Support Multi-nodes training on Slurm

DeepSpeed Training and Inference

It seems that in your scripts, the locak_rank always equals -1, so actually you didn't use deepspeed's parallel ability? e.g., data parallel or model parallel

[Severity] High similarity with Colossal-AI

Dear OpenLLMAI Team,

This is the Colossal-AI team.
Thank you for your contributions to the open source community.
But it looks like your open source content is highly similar to Colossal-AI and not properly referenced.

For example, The overall structure of your repos within the organization is very similar to ColossalAI/applications

There are also many highly similar details in the code, just give a few simple examples.

Replay_ Buffer Design&Implementation
OpenLLMAI with Colossal-AI
Experience Maker Design&Implementation
OpenLLMAI with Colossal-AI
Trainer Design&Implementation
OpenLLMAI with Colossal-AI

There are still many similarities that will not be listed one by one.
We hope that you follow the corresponding open source, academic, commercial, etc., norms immediately make the corresponding corrective measures.
This includes but is not limited to prominently referencing the Colossal-AI project on the homepage and LICENSE file of each project, and prominently indicating in each relevant code file which code of the Colossal AI project was referenced for implementation.

Thank you very much.
Colossal-AI team

Discussion on our 1st release.

Hi team,
as many functions are able to use, let's discuss our 1st alpha release. Please propose the things that you think they're need to be closed. Thanks.

请问模型如何

我看实现方法里有DPO和PPO，有现成的结果嘛，PPO比SFT提升了多少，以及DPO比PPO提升了多少之类的，加上这个结果对想从事这个方向的研究人员帮助会非常大~

Support more prompt template in datasets

Introduce LINT tools

Support hybird-model in Ray PPO

such as: Baichuan2 for SFT/Actor model & Llama2 for RM/Critic model.

Vocabulary overflow Issue with [PAD] for SFT

When expanding the tokenizer's vocabulary with [PAD] during SFT (https://github.com/OpenLLMAI/OpenLLaMA2/blob/main/examples/utils.py#L22), the tokenizer's vocabulary size becomes 32001 (e.g., print(size(tokenzier.vocab))).

However, the model's embedding can only handle up to 32000 entries.

This issue arises apparent when the startup parameter args.micro_train_batch_size=2 is utilized.

The default args.micro_train_batch_size=1, the dataloader does not invoke dataset.collect_fn, which consequently bypasses the padding process (and the potential overflow of the [PAD] token).

I would appreciate attention to this matter as it affects cases where larger batch sizes are used during SFT.

ckpt下载

请问能不能给一个其它ckpt下载链接，国内huggingface下载太不方便了

Support llama2 flash attention

Implement Re-max

Support Lora & QLora

pydantic.error_wrappers.ValidationError: 3 validation errors for DeepSpeedZeroConfig zero_hpz_partition_size extra fields not permitted (type=value_error.extra)

In the first SFT

deepspeed version == v0.9.5: Patch release
ValidationError: 3 validation errors for DeepSpeedZeroConfig
zero_hpz_partition_size
extra fields not permitted (type=value_error.extra)
zero_quantized_gradients
extra fields not permitted (type=value_error.extra)

Support Adam Optmizer offload and reload to GPU

Loading RM ckpt bug: AttributeError: 'NoneType' object has no attribute 'load'

lauch scripts:

set -x 

read -r -d '' training_commands <<EOF
../train_rm.py \
     --save_path ./ckpt_test/TinyLlama \
     --train_batch_size 128 \
     --micro_train_batch_size 1 \
     --pretrain TinyLlama/TinyLlama-1.1B-Chat-v0.1 \
     --bf16 \
     --max_epochs 1 \
     --max_len 2048 \
     --zero_stage 3 \
     --learning_rate 5e-7 \
     --dataset tasksource/oasst1_pairwise_rlhf_reward \
     --dataset_probs 1.0 \
     --gradient_checkpointing \
     --adam_offload \
     --use_wandb xxxxxxxxx \
     --eval_steps 2 
		 --save_steps 2

EOF
     # --wandb [WANDB_TOKENS]

if [[ ${1} != "slurm" ]]; then
    export PATH=$HOME/.local/bin/:$PATH
    deepspeed $training_commands
    # deepspeed --include localhost:1 $training_commands
fi

root@xxx:../OpenLLaMA2/examples/scripts/ckpt/checkpoints_rm_llama# du -ah
5.8G    ./global_step22/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
5.8G    ./global_step22/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
104K    ./global_step22/zero_pp_rank_0_mp_rank_00_model_states.pt
104K    ./global_step22/zero_pp_rank_1_mp_rank_00_model_states.pt
12G     ./global_step22
5.8G    ./global_step24/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
5.8G    ./global_step24/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
104K    ./global_step24/zero_pp_rank_0_mp_rank_00_model_states.pt
104K    ./global_step24/zero_pp_rank_1_mp_rank_00_model_states.pt
12G     ./global_step24
5.8G    ./global_step26/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
5.8G    ./global_step26/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
104K    ./global_step26/zero_pp_rank_0_mp_rank_00_model_states.pt
104K    ./global_step26/zero_pp_rank_1_mp_rank_00_model_states.pt
12G     ./global_step26
4.0K    ./latest
24K     ./zero_to_fp32.py
35G     .

model file: zero_pp_rank_*mp_rank**_model_states.pt is only 104k. it seems have something error when saving.

the code for loading ckpt：

import argparse
import os
from datetime import timedelta

import jsonlines
import torch
from torch import distributed as dist
from tqdm import tqdm

from openrlhf.datasets import PromptDataset, SFTDataset
from openrlhf.models import Actor, RewardModel
from openrlhf.utils import blending_datasets, get_processor, get_strategy, get_tokenizer
import pdb

parser = argparse.ArgumentParser()
parser.add_argument("--eval_task", type=str, default="rm", help="set to generate or rm")
parser.add_argument("--pretrain", type=str, default="TinyLlama/TinyLlama-1.1B-Chat-v0.1")
parser.add_argument("--load_model", type=str, default=None)
parser.add_argument("--max_len", type=int, default=2048)
parser.add_argument("--zero_stage", type=int, default=3)
parser.add_argument("--local_rank", type=int, default=-1, help="local_rank for deepspeed")
parser.add_argument("--bf16", action="store_true", default=True)
parser.add_argument("--flash_attn", action="store_true", default=False)
parser.add_argument("--micro_batch_size", type=int, default=1)
parser.add_argument("--dataset", type=str, default=None)
parser.add_argument("--dataset_probs", type=str, default="1.0")
parser.add_argument("--output_path", type=str, default="./")
parser.add_argument("--max_samples", type=int, default=500000)
parser.add_argument("--seed", type=int, default=1234)

# for generation
parser.add_argument("--inference_tp_size", type=int, default=1)
parser.add_argument("--ta_prompt", type=str, default="")
parser.add_argument("--prompt_max_len", type=int, default=1024)
parser.add_argument("--greedy_sampling", action="store_true", default=False)
parser.add_argument("--top_p", type=float, default=0.9)
parser.add_argument("--temperature", type=float, default=1.0)
parser.add_argument("--repetition_penalty", type=float, default=1.2)
parser.add_argument("--best_of_n", type=int, default=1)
parser.add_argument(
    "--post_processor",
    type=str,
    default=None,
    help="set to rs (Rejection Sampling), dt (Decision Transformer) or None",
)

# for Iterative generation and Rejection Sampling
parser.add_argument("--iter", type=int, default=None)
parser.add_argument("--rollout_batch_size", type=int, default=2048)

# for Decision Transformer (DT) generation
parser.add_argument("--normalize_reward", action="store_true", default=False)
parser.add_argument("--reward_template", type=str, default=None)
# for DT evaluation
parser.add_argument("--enable_dt", action="store_true", default=False)
parser.add_argument("--dt_prompt", type=str, default="<rm_score>: 5.00", help="decision transformer prompt")

args = parser.parse_args()


# configure strategy
strategy = get_strategy(args)
strategy.setup_distributed(timeout=timedelta(seconds=9999999))

# configure model
# load huggingface model/config
from_config = bool(args.load_model)
model = RewardModel(args.pretrain, from_config, use_flash_attention_2=args.flash_attn)
# prepare models

model = strategy.prepare(model)
model.eval()

load_dir = "./ckpt/checkpoints_rm_llama"
tag = "global_step2"
model = model.load_checkpoint(load_dir=load_dir, tag=tag)

# model = strategy.load_ckpt(model=model, load_dir=load_dir,
#     tag=tag,
#     load_module_strict=True,
#     load_optimizer_states=True,
#     load_lr_scheduler_states=True,
#     load_module_only=False)

bug log:

root@xxx:../OpenLLaMA2/examples/scripts# bash inference_rm.sh 
+ read -r -d '' training_commands
+ [[ '' != \s\l\u\r\m ]]
+ export PATH=/root/.local/bin/:/usr/local/nvm/versions/node/v16.20.0/bin:/usr/local/nvm/versions/node/v16.20.0/bin:/etc/dsw/code-server/lib/vscode/bin/remote-cli:/usr/local/nvm/versions/node/v16.20.0/bin:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin:/usr/local/nvm/versions/node/v16.20.0/bin:/etc/dsw/node/bin:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin
+ PATH=/root/.local/bin/:/usr/local/nvm/versions/node/v16.20.0/bin:/usr/local/nvm/versions/node/v16.20.0/bin:/etc/dsw/code-server/lib/vscode/bin/remote-cli:/usr/local/nvm/versions/node/v16.20.0/bin:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin:/usr/local/nvm/versions/node/v16.20.0/bin:/etc/dsw/node/bin:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/tensorrt/bin
+ deepspeed ../inference_rm.py --pretrain TinyLlama/TinyLlama-1.1B-Chat-v0.1
[2023-12-14 03:23:20,923] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-14 03:23:26,717] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-12-14 03:23:26,718] [INFO] [runner.py:570:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None ../inference_rm.py --pretrain TinyLlama/TinyLlama-1.1B-Chat-v0.1 --bf16
[2023-12-14 03:23:28,724] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-14 03:23:34,520] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.18.3
[2023-12-14 03:23:34,520] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-12-14 03:23:34,520] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-12-14 03:23:34,520] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-12-14 03:23:34,520] [INFO] [launch.py:163:main] dist_world_size=2
[2023-12-14 03:23:34,520] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2023-12-14 03:23:41,742] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-14 03:23:41,889] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
123
[2023-12-14 03:23:42,858] [INFO] [comm.py:637:init_distributed] cdb=None
123
[2023-12-14 03:23:42,904] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-12-14 03:23:42,904] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-12-14 03:23:53,919] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.11.1, git-hash=unknown, git-branch=unknown
[2023-12-14 03:23:53,919] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
[2023-12-14 03:23:58,729] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-12-14 03:23:58,730] [INFO] [logging.py:96:log_dist] [Rank 0] Creating ZeRO Offload
[2023-12-14 03:23:58,853] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2023-12-14 03:23:58,853] [INFO] [utils.py:803:see_memory_usage] MA 1.94 GB         Max_MA 1.94 GB         CA 2.03 GB         Max_CA 2 GB 
[2023-12-14 03:23:58,854] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory:  used = 14.45 GB, percent = 4.3%
Parameter Offload: Total persistent parameters: 94209 in 47 params
[2023-12-14 03:23:59,037] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from /xxxx/OpenLLaMA2/examples/scripts/ckpt/checkpoints_rm_llama/global_step22/zero_pp_rank_1_mp_rank_00_model_states.pt...
[2023-12-14 03:23:59,042] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from /xxxx/OpenLLaMA2/examples/scripts/ckpt/checkpoints_rm_llama/global_step22/zero_pp_rank_1_mp_rank_00_model_states.pt.
[2023-12-14 03:23:59,042] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from /xxxx/OpenLLaMA2/examples/scripts/ckpt/checkpoints_rm_llama/global_step22/zero_pp_rank_1_mp_rank_00_model_states.pt...
[2023-12-14 03:23:59,047] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from /xxxx/OpenLLaMA2/examples/scripts/ckpt/checkpoints_rm_llama/global_step22/zero_pp_rank_1_mp_rank_00_model_states.pt.
Traceback (most recent call last):
  File "/xxxx/OpenLLaMA2/examples/scripts/../inference_rm.py", line 86, in <module>
    model = model.load_checkpoint(load_dir=load_dir, tag=tag)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 2708, in load_checkpoint
    success = self._load_zero_checkpoint(load_dir, tag, load_optimizer_states=load_optimizer_states)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 2875, in _load_zero_checkpoint
    zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 2950, in _get_all_zero_checkpoints
    return self._get_all_zero_checkpoint_state_dicts(zero_ckpt_names)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 2929, in _get_all_zero_checkpoint_state_dicts
    _state = self.checkpoint_engine.load(
AttributeError: 'NoneType' object has no attribute 'load'

@hijkzzz @catqaq

Feature: Fine-grained Decoding Control

ref: https://platform.openai.com/docs/api-reference/completions/create

Support top chinese language models

Support wandb logs

Support pretrain and post-pretrain

Feature: Support detailed running process management: save_steps, log_steps, eval_steps

I got a training collapse when saving models. so, 30 hours wasted.
see:#101

So, We need more detailed experimental process management.
save_steps is ralated to #65
log_steps, eval_steps are also necessary.

Support Rejection Sampling

问一下，huggingface提供的checkpoint为pt文件，如何转成大模型常见的.bin

Bug: AttributeError: 'DeepspeedStrategy' object has no attribute 'save_hf_format'

when use --save_hf_model:
I got a training collapse when saving models. so, 30 hours wasted.

strategy.save_hf_format(model, tokenizer, args.save_path + '/sft_hf')
AttributeError: 'DeepspeedStrategy' object has no attribute 'save_hf_format'

At last, only a 67k sft_model.pt was saved.

Error occurred when loading datasets from disk

I download Open-Orca/OpenOrca dataset to my disk and then set the --dataset as the saved address. However, an error occurred:

(lzy-rlhf) root@di-20231110113227-9fqgm:/alg_vepfs/public/LZY/mycodes/OpenRLHF/examples/pyscripts# bash train_sft_llama.sh 
[2023-11-13 15:00:47,432] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-13 15:00:49,850] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-13 15:00:49,850] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2023-11-13 15:00:50,267] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.18.56.49, master_port=29500
[2023-11-13 15:00:50,267] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.52s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using pad_token, but it is not set yet.
add pad_token
Actor(
  (model): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32001, 4096)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): LlamaAttention(
            (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): LlamaMLP(
            (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
            (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
            (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
            (act_fn): SiLUActivation()
          )
          (input_layernorm): LlamaRMSNorm()
          (post_attention_layernorm): LlamaRMSNorm()
        )
      )
      (norm): LlamaRMSNorm()
    )
    (lm_head): Linear(in_features=4096, out_features=32001, bias=False)
  )
)
Installed CUDA version 11.8 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_adam...
Time to load fused_adam op: 3.212965488433838 seconds
dataset: /alg_vepfs/public/LZY/dataset/OpenOrca
load local data file: /alg_vepfs/public/LZY/dataset/OpenOrca
script: []
files: ['/alg_vepfs/public/LZY/dataset/OpenOrca/dataset_dict.json', '/alg_vepfs/public/LZY/dataset/OpenOrca/train/state.json', '/alg_vepfs/public/LZY/dataset/OpenOrca/train/dataset_info.json']
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5599.87it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 534.51it/s]
Generating train split: 1 examples [00:00, 251.35 examples/s]
Traceback (most recent call last):
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/builder.py", line 1940, in _prepare_split_single
    writer.write_table(table)
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/arrow_writer.py", line 572, in write_table
    pa_table = table_cast(pa_table, self._schema)
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/table.py", line 2328, in table_cast
    return cast_table_to_schema(table, schema)
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/table.py", line 2286, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
_data_files: list<item: struct<filename: string>>
  child 0, item: struct<filename: string>
      child 0, filename: string
_fingerprint: string
_format_columns: null
_format_kwargs: struct<>
_format_type: null
_output_all_columns: bool
_split: string
to
{'splits': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)}
because column names don't match

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/alg_vepfs/public/LZY/mycodes/OpenRLHF/examples/pyscripts/../train_sft.py", line 146, in <module>
    train(args)
  File "/alg_vepfs/public/LZY/mycodes/OpenRLHF/examples/pyscripts/../train_sft.py", line 42, in train
    train_data, eval_data = blending_datasets(args.dataset, args.dataset_probs, strategy, args.seed)
  File "/root/.local/lib/python3.9/site-packages/openrlhf/utils/utils.py", line 119, in blending_datasets
    data = load_dataset(data_type, data_files=files)
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/load.py", line 2153, in load_dataset
    builder_instance.download_and_prepare(
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/builder.py", line 954, in download_and_prepare
    self._download_and_prepare(
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/builder.py", line 1813, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/alg_vepfs/public/miniconda_dirs/envs/lzy-rlhf/lib/python3.9/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

feature: add api support for hosting a reward model

I want to use a 70b parameter model as my reward model. It is inefficient to load such model from pretrained and ideally should be queried through an api. However, the existing class does not support such usage.

Could this feature be implemented?

Or if it has, could someone point me to its usage as I cannot find it.

Kind thanks

Add pipeline module to support more scientific comparative experiments and research

Support Multiple Reward Models

HfDeepSpeedConfig must be kept during AutoModel.from_pretrained if using ZeRO-3

According to Non-Trainer Deepspeed Integration:

The HfDeepSpeedConfig is used to integrate Deepspeed into the 🤗 Transformers core functionality, when Trainer is not used. The only thing that it does is handling Deepspeed ZeRO-3 param gathering and automatically splitting the model onto multiple gpus during from_pretrained call.

from transformers.integrations import HfDeepSpeedConfig
from transformers import AutoModel
import deepspeed

ds_config = {...}  # deepspeed config object or path to the file
# must run before instantiating the model to detect zero 3
dschf = HfDeepSpeedConfig(ds_config)  # keep this object alive
model = AutoModel.from_pretrained("gpt2")
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)

But we seem to missing HfDeepSpeedConfig when init Actor, Critic, Reward.

Support DPO

Local dataset: Please perform appropriate preprocessing on your local data set.

We use huggingface's load_dataset to support common local data formats, but due to the diversity of various data sets themselves, it is impossible to cover the preprocessing of all data sets.
So,
We recommend that you perform appropriate preprocessing on local data or provide appropriate preprocessing scripts.

Data set format issues:
#134

更大的模型

你好，请问，支持更大的llama模型吗，例如13B，30B等等。

PPO OOM

8*A100-80G:
Traceback (most recent call last):[02:06<01:20, 13.44s/it, pg=-.0119, cri=0.0702, vals=-.0352, kl=0, rm=0.0909, ret=0.0909, glen=1
File "../train_ppo.py", line 239, in
train(args)
File "../train_ppo.py", line 164, in train
trainer.fit(prompts_dataloader,
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/openllama2/trainer/ppo_trainer.py", line 143, in fit
status = self.ppo_train()
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/openllama2/trainer/ppo_trainer.py", line 166, in ppo_train
status = self.training_step(experience)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/openllama2/trainer/ppo_trainer.py", line 209, in training_step
self.strategy.backward(actor_loss, self.actor, self.actor_optim)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/openllama2/utils/deepspeed.py", line 81, in backward
model.backward(loss)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/opt/conda/envs/llama2/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 2; 79.35 GiB total capacity; 66.47 GiB already allocated; 3.87 GiB free; 72.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2023-08-31 02:44:16,573] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10230
[2023-08-31 02:44:22,676] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10231
[2023-08-31 02:44:29,025] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10232
[2023-08-31 02:44:29,025] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10233
[2023-08-31 02:44:37,723] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10235
[2023-08-31 02:44:45,725] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10237
[2023-08-31 02:44:54,060] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10239
[2023-08-31 02:45:01,505] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 10241
[2023-08-31 02:45:08,532] [ERROR] [launch.py:321:sigkill_handler]
['/opt/conda/envs/llama2/bin/python3', '-u', '../train_ppo.py', '--local_rank=7', '--pretrain', './models/Llama-2-7b-hf', '--critic_pretrain', './models/Llama-2-7b-hf', '--reward_model_path', './ckpt/7b_llama/rm_model.pt', '--sft_model_path', './ckpt/7b_llama/sft_model.pt', '--save_path', './ckpt/7b_llama', '--micro_train_batch_size', '1', '--train_batch_size', '128', '--micro_rollout_batch_size', '1', '--rollout_batch_size', '1024', '--max_epochs', '1', '--prompt_max_len', '1024', '--generate_max_len', '1024', '--zero_stage', '2', '--bf16', '--actor_learning_rate', '5e-7', '--critic_learning_rate', '9e-6', '--inference_tp_size', '1', '--init_kl_coef', '0.01', '--prompt_data', 'yahma/alpaca-cleaned,Dahoas/full-hh-rlhf,tasksource/oasst1_pairwise_rlhf_reward', '--prompt_data_probs', '0.3,0.6,0.1', '--normalize_reward', '--adam_offload', '--gradient_checkpointing'] exits with return code = 1

Support Evaluation Tools

including GPT-4, human evaluation, MMLU ,etc.

Version0.0.1: Release the first development version

Features list: TODO

Docs list: TODO

fixing Typo

need to fix the words(manualy--manually , sepecified--specified) in the below picture , please assign it to me

Add GPT-4 evaluation scripts

[QUESTION] huggingface login in readme

Running LLaMA2 Example

huggingface login

/.local/bin/huggingface-cli login

it doesn't work in nvidia container.

开启ppo-ptx会出现梯度重复计算的报错

运行脚本：

../train_ppo.py \
    --pretrain /data/chuxiong/chinese-llama-2-7b-eot \
    --critic_pretrain /data/chuxiong/chinese-llama-2-7b-eot \
    --reward_model_path ./ckpt/chinese-llama-2-7b-openchat-rm/rm_model.pt \
    --sft_model_path /data/chuxiong/openchat/outputs/chinese-llama-2-7b-openchat/ep_4 \
    --save_path ./ckpt/chinese-llama-2-7b-openchat-ppo \
    --micro_train_batch_size 1 \
    --train_batch_size 126 \
    --micro_rollout_batch_size 1 \
    --rollout_batch_size 1024 \
    --max_epochs 1 \
    --prompt_max_len 1024 \
    --generate_max_len 1024 \
    --zero_stage 2 \
    --bf16 \
    --actor_learning_rate 5e-7 \
    --critic_learning_rate 9e-6 \
    --inference_tp_size 1 \
    --init_kl_coef 0.01 \
    --prompt_data /data/chuxiong/hh_rlhf_cn_prompt \
    --prompt_data_probs 1. \
    --pretrain_data dlwh/wikitext_103_detokenized \
    --pretrain_data_probs 1. \
    --ptx_coef 1. \
    --normalize_reward \
    --adam_offload \
    --gradient_checkpointing

报错：

File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/openllama2/trainer/ppo_trainer.py", line 182, in ppo_train
    status = self.training_step(experience)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/openllama2/trainer/ppo_trainer.py", line 237, in training_step
    self.strategy.backward(actor_loss, self.actor, self.actor_optim)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/openllama2/utils/deepspeed.py", line 94, in backward
    model.backward(loss)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)   
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 814, in reduce_partition_and_remove_grads
    self.reduce_ready_partitions_and_remove_grads(param, i)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1262, in reduce_ready_partitions_and_remove_grads
    self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 847, in reduce_independent_p_g_buckets_and_remove_grads
    assert self.params_already_reduced[param_id] == False, \
AssertionError: The parameter 286 has already been reduced.             Gradient computed twice for this partition.             Multiple gradient reduction is currently not supported

尝试更新到最新版deepspeed，又报下面的错：

  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/openllama2/trainer/ppo_trainer.py", line 
237, in training_step
    self.strategy.backward(actor_loss, self.actor, self.actor_optim)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/openllama2/utils/deepspeed.py", line 94, 
in backward
    model.backward(loss)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wra
pped_fn
    ret_val = func(*args, **kwargs)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1890, 
in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py",
 line 1953, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", l
ine 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in
 apply
    return user_fn(self, *args)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in 
backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in
 backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py",
 line 871, in reduce_partition_and_remove_grads
    self.reduce_ready_partitions_and_remove_grads(param, i)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1332, in reduce_ready_partitions_and_remove_grads
    self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 899, in reduce_independent_p_g_buckets_and_remove_grads
    self.reduce_ipg_grads()
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1319, in reduce_ipg_grads
    self.copy_grads_in_partition(param)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1239, in copy_grads_in_partition
    self.async_accumulate_grad_in_cpu_via_gpu(param)
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1143, in async_accumulate_grad_in_cpu_via_gpu
    accumulate_gradients()
  File "/data/conda3/usr/chuxiong/envs/scx_llm/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1122, in accumulate_gradients
    param.grad_accum.data.view(-1).add_(dest_buffer)
AttributeError: 'NoneType' object has no attribute 'data'

Add better docs and usage examples

AttributeError: 'LlamaModel' object has no attribute 'backward'

Hi, thank you for making this repo!
I'm making reward model and i encounter this error,,

model_id = 'meta-llama/Llama-2-7b-hf'
model = RewardModel(model_id)
model.lora_enable(args.lora_rank)

tokenizer = get_tokenizer(model_id, model.model, "left", strategy)
train_dataset = RewardDataset(data_2, tokenizer, 512, strategy)

...


train_dataloader = strategy.setup_dataloader(
    train_dataset, batch_size=3, pin_memory=False, shuffle=False, collate_fn=train_dataset.collate_fn,
)
num_update_steps_per_epoch = len(
    train_dataloader) * args.max_epochs // strategy.accumulated_gradient
max_steps = math.ceil(args.max_epochs * num_update_steps_per_epoch)
optim = strategy.create_optimizer(
    model, lr=args.learning_rate, betas=(0.9, 0.95), weight_decay=args.l2)
scheduler = get_scheduler(
    "cosine",
    optim,
    num_warmup_steps=math.ceil(max_steps * 0.03),
    num_training_steps=max_steps,
)
...

        chosen_reward = model(chosen_ids, attention_mask=c_mask)
        reject_reward = model(reject_ids, attention_mask=r_mask)

        loss = loss_fn(chosen_reward, reject_reward)

        acc_mean = acc_mean * 0.9 + 0.1 * \
            (chosen_reward > reject_reward).float().mean().item()
        loss_mean = loss_mean * 0.9 + 0.1 * loss.item()
        reward_diff_mean = reward_diff_mean * 0.9 + 0.1 * \
            (chosen_reward - reject_reward).mean().item()
        
        print(loss) 
        strategy.backward(loss, model, optim)
        strategy.optimizer_step(
            optim, model, scheduler)

deepspeed rm_test.py \
     --save_path ./ckpt/7b_llama \
     --train_batch_size 1 \
     --micro_train_batch_size 1 \
     --pretrain meta-llama/Llama-2-7b-hf \
     --max_epochs 1 \
     --max_len 1024 \
     --zero_stage 2 \
     --learning_rate 9e-6 \

Please help me understand what this error is and how to fix it.

and i have question about how model.backward(loss) work.
file path is 'OpenLLaMA2/openllama2/utils/deepspeed.py/DeepspeedStrategy/backward'

thank you.. This is my first time posting a question on Git, so it may be different from the question format you usually see, so please let me know if there are any problems.

openllmai / openrlhf Goto Github PK

openrlhf's People

Contributors

Stargazers

Watchers

Forkers

openrlhf's Issues

huggingface login

Recommend Projects

Recommend Topics

Recommend Org