The reproduce-dpo from qiyaowei

reproduce-dpo's Introduction

Ongoing work to reproduce DPO.

Cite:

@article{rafailov2024direct, title={Direct preference optimization: Your language model is secretly a reward model}, author={Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Manning, Christopher D and Ermon, Stefano and Finn, Chelsea}, journal={Advances in Neural Information Processing Systems}, volume={36}, year={2024} }

@article{wang2023beyond, title={Beyond reverse kl: Generalizing direct preference optimization with diverse divergence constraints}, author={Wang, Chaoqi and Jiang, Yibo and Yang, Chenghao and Liu, Han and Chen, Yuxin}, journal={arXiv preprint arXiv:2309.16240}, year={2023} }

reproduce-dpo's People

Contributors

Stargazers

Watchers

reproduce-dpo's Issues

reference model weights

Hi,

We are also reproducing dpo, and are wondering that whether you have saved the weights of gpt2-large model that is finetuned on the preference dataset for 3 epochs as the paper suggested? Is this the model: https://huggingface.co/QiyaoWei/Vanilla-SFT-GPT2Large/tree/main

Thank you so much!!

qiyaowei / reproduce-dpo Goto Github PK

reproduce-dpo's Introduction

reproduce-dpo's People

Contributors

Stargazers

Watchers

reproduce-dpo's Issues

reference model weights

KL Divergence

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent