Light

joyce94 / llm-rlhf-tuning Goto Github PK

View Code? Open in Web Editor NEW

324.0 2.0 13.0 22.84 MB

LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)

Shell 7.29% Python 92.71%

fine-tuning language-model llama llm lora peft ppo reinforcement-learning rlhf

llm-rlhf-tuning's Introduction

LLM-RLHF-Tuning

本项目从零实现了RLHF三阶段训练，并在文档中详细写了实现细节，欢迎大家交流讨论WeChat

主要内容：

支持指令微调Alpaca模型
支持训练Reward模型
支持PPO算法训练RL模型
- 支持基于两个基模型，两个lora的适配器，同时加载RM、SFT、Actor、Critic四个模型，支持accelerate分布式训练（PPO算法实现细节）
- 支持基于一个基模型，两个lora适配器，同时加载RM、SFT、Actor、Critic四个模型，支持accelerate、deepspeed训练
- 支持基于一个基模型，一个lora适配器，Actor、Critic共享base model，同时实现RM、SFT、Actor、Critic四个模型功能，支持accelerate、deepspeed训练
支持DPO算法训练模型

更新

[23/8/23] 支持LLaMA2模型训练；支持DPO训练；支持基于一个基模型、选择一个或两个lora适配器训练PPO、支持accelerate、deepspeed训练
[23/8/13] 支持LLaMA模型训练；支持基于两个基模型、两个lora的适配器训练PPO；支持accelerate分布式训练

功能

与开源的RLHF训练框架的功能进行对比

框架	SFT Train	RM Train	PPO Train	DPO Train
Our	✅	✅	✅	✅
Deepspeed-chat	✅	✅	✅
trl	✅	✅	✅	✅
MOSS-RLHF			✅

PPO Train

框架	Accelerate	Deepspeed	Multi LORA	最低模型参数量 (7B为例)
Our	✅	✅	✅	single model size ～ 7B
Deepspeed-chat		✅		sft+rm+actor+critic ～ 28B
trl	✅			single model size（not use ref model）～ 7B
MOSS-RLHF	actor model、critic model	sft model、rm model		sft+rm+actor+critic ～ 28B

使用指引

环境搭建

accelerate==0.21.0
datasets==2.13.1
scikit-learn==1.3.0
sentencepiece==0.1.99
tqdm==4.65.0
transformers==4.31.0
wandb==0.15.8
peft==0.4.0
torch==2.0.1
trl==0.5.0
deepspeed==0.10.0

支持模型

LLaMA
LLaMA2

支持训练方式

LoRA

训练细节

指令微调模型

训练指南

训练奖励模型

训练指南

PPO训练

训练指南
- 基于两个基模型
  - PPO算法实现细节
- 基于一个基模型

DPO训练

训练指南

TODO

欢迎加群讨论 WeChat

llm-rlhf-tuning's People

Contributors

Stargazers

Watchers

Forkers

wizard1203 mumu1126 ashishpatel26 techthiyanes jakie-zfy gaohuan2015 hanwenyuan0907 michaelcola 157459387 zhangbo2008 pratikbarjatya davidyuan666 tyfloving

llm-rlhf-tuning's Issues

交流RLHF经验

您好看到您开源了自己的RLHF方案
麻烦问下在您的实际tuning过程中 PPO收敛后输出的sample是否人观察，稳定优于原始sft model？（非safe领域）

二维码过期

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.