Code Monkey home page Code Monkey logo

easy-to-hard's Introduction

Easy-to-Hard Generalization

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Guided by the observation that evaluation is easier than generation, we enabled large language models to excel on hard math problems beyond human evaluation capabilities through the easy-to-hard generalization of evaluators (e.g., process reward models). For comprehensive details and insights, we kindly direct you to our paper.

Easy-to-Hard Logo

Downloading pre-tuned PRM800K / MetaMath models

We provide model checkpoints for the supervised fine-tuned models and reward models. The current list of models includes:

Reproduction

Please check the examples for the training scripts and data for the data preparation.

Citation

Please consider citing our work if you use the data or code in this repo.

@article{sun2024easy,
  title={Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision},
  author={Sun, Zhiqing and Yu, Longhui and Shen, Yikang and Liu, Weiyang and Yang, Yiming and Welleck, Sean and Gan, Chuang},
  journal={arXiv preprint arXiv:2403.09472},
  year={2024}
}

Below is copied from the Gpt-Accelera GitHub repository in 2024-03-19

gpt-accelera

Simple and efficient pytorch-native transformer training and inference (batched).

gpt-accelera is a codebase based on gpt-fast -- the state-of-the-art pytorch-native tensor-parallel implementation of transformer text generation that minimizes latency (i.e. batch size=1) -- with the following improvements:

Featuring:

  • Batched (i.e., batch size > 1) inference with compiled graph (i.e., torch.compile)
  • 2-D parallelism (Tensor-Parallel (TP) + Fully Sharded Data Parallel (FSDP)) training with mixed precision (i.e., torch.cuda.amp)
  • Supports for both LLaMA and DeepSeek models
  • Supports training policy models with Supervised Fine-Tuning (SFT)
  • Supports training reward models (RM) with pointwise and pairwise losses
  • Supports on-policy (PPO) and off-policy (DPO) reinforcement learning (RL) training
  • All the training can be performed with full fine-tuning for 7b-34b LLaMA/Llemma models

Shared features w/ gpt-fast:

  • Very low latency (on inference, batched inference, SFT, and PPO)
  • No dependencies other than PyTorch and sentencepiece
  • int8/int4 quantization (for inference and ref_policy / reward_model in PPO)
  • Supports Nvidia and AMD GPUs (?, TODO: test the codebase on AMD)

Following the spirit of gpt-fast, this repository is NOT intended to be a "framework" or "library", but to show off what kind of performance you can get with native PyTorch. Please copy-paste and fork as you desire.

Installation

Install torch==2.2.0, sentencepiece, and huggingface_hub:

pip install sentencepiece huggingface_hub

Downloading Weights

Models tested/supported

meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-70b-chat-hf
codellama/CodeLlama-7b-Python-hf
codellama/CodeLlama-34b-Python-hf
EleutherAI/llemma_7b
EleutherAI/llemma_34b
deepseek-ai/deepseek-llm-7b-base
deepseek-ai/deepseek-coder-6.7b-base
deepseek-ai/deepseek-math-7b-base

Benchmarks

TODO: Add benchmarks

Running reference methods

TODO: Add reference methods

License

Following gpt-fast, gpt-accelera is licensed under the BSD 3 license. See the LICENSE file for details.

Community

The gpt-accelera codebase is developed during the research and development of the Easy-to-Hard Generalization project.

Citation

Please consider citing our work if you use the data or code in this repo.

@misc{gpt_accelera,
  author = {Zhiqing Sun },
  title = {GPT-Accelera: Simple and efficient pytorch-native transformer training and inference (batched)},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Edward-Sun/gpt-accelera}}
}

Acknowledgements

We thank the authors of following works for their open-source efforts in democratizing large language models.

  • The compiled generation part of gpt-accelera is adopted from gpt-fast
  • The RL part of gpt-accelera is adopted from SALMON, which is from alpaca_farm.
  • The tokenization part of gpt-accelera is adopted from transformers

easy-to-hard's People

Contributors

edward-sun avatar

Stargazers

Longhui Yu avatar  avatar  avatar Junlong Li avatar Jeff Carpenter avatar wckwan avatar Yuxiang Wei avatar Mingxuan (Aldous) Li avatar Reed.Z avatar Hyeonbin Hwang avatar  avatar Shawn Guo avatar  avatar Pumpkin avatar Fan avatar Ziyú Ye avatar klein avatar  avatar Vishaal Udandarao avatar Sachit Menon avatar Shan Gao avatar  avatar Seungone Kim avatar Sean Welleck avatar  avatar Yunkun Xu avatar  avatar Jacky Chan avatar Gege Gao avatar Tiancheng Zhao (Tony)  avatar Xu Huang avatar  avatar  avatar Qiushi avatar CooperLeong avatar Zengzhi Wang avatar Zirui Wu avatar Shibo Hao avatar init avatar Ruocheng Guo avatar Tokarev Igor avatar Weihao Zeng avatar Junyan Xu avatar Lu Ming avatar Weiyang Liu avatar  avatar Yixin Nie avatar Jacques Thibodeau avatar Gang Li avatar Wei Liu avatar Solaris avatar Daehoon Gwak avatar truongpdd avatar Qinyuan Cheng avatar Yue Zhang avatar baeseongsu avatar junkang Wu avatar

Watchers

 avatar  avatar Ziyú Ye avatar

easy-to-hard's Issues

About the training scripts.

I have two questions about the script that trains the reward model:

  1. In the example, the entry file of the training reward model is finetune_rm.py. In the actual project code, does it refer to train_rm_pointwise.py or train_rm_pairwise.py?
  2. train_rm_pointwise.py and train_rm_pairwise.py look exactly the same, so what is the difference between the training purpose of these two files? If the purpose is different, how does the same document reflect the difference between the two training modes?

Looking forward to the answer. Thanks.

prm loss变化

请教一个问题,在训练prm时loss变化是如何的

Two questions about the article

Great article, thank you for sharing the code. I have two questions.

  1. How do you control the Generator model to only generate one step at a time?
  2. How does the Evaluator model apply the evaluation results of each step to the generation of the next step? Is it iteratively adding the selected step to the prompt?

SFT error

When I run the train_sft.py, the following error occurred:

[2024-04-15 18:07:39,652] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Traceback (most recent call last):
File "/media/cc-4090x2/系统/zxy/easy-to-hard/train_sft.py", line 43, in
from models.tokenizer_utils import AcceleraTokenizer
File "/media/cc-4090x2/系统/zxy/easy-to-hard/models/tokenizer_utils.py", line 27, in
from sentencepiece import SentencePieceProcessor
ImportError: cannot import name 'SentencePieceProcessor' from 'sentencepiece' (unknown location)
Traceback (most recent call last):
File "/media/cc-4090x2/系统/zxy/easy-to-hard/train_sft.py", line 43, in
from models.tokenizer_utils import AcceleraTokenizer
File "/media/cc-4090x2/系统/zxy/easy-to-hard/models/tokenizer_utils.py", line 27, in
from sentencepiece import SentencePieceProcessor
ImportError: cannot import name 'SentencePieceProcessor' from 'sentencepiece' (unknown location)

However, I have already pip install sentencepiece:

Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: sentencepiece in /home/cc-4090x2/anaconda3/envs/Phi2/lib/python3.12/site-packages (0.2.0)

How can I solve this problem?

readme for data

Dear authors,

Thanks for sharing the code. I wonder if you can update the data/readme.md for instructions to prepare the data.

Best regards,

RC

Question about the results of PRM and ORM

Dear Authors,

Firstly, I would like to express my appreciation for this outstanding work.

Upon reviewing your results, particularly those illustrated in Figures 8 and 9, I observed something intriguing. In the context of the findings presented in the original PRM800k papers, it was clear that PRM considerably outperformed ORM. However, based on the results you've shared, PRM and ORM appear to perform on a par with each other.

Could you please provide further insight into this discrepancy?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.