The remax from liziniu

Bugs when using zero-stage3

Great work!
I met the following bug when I ran your code with REFERENCE_ZERO_STAGE=3.
AttributeError: 'LlamaAttention' object has no attribute 'rope_theta'.
I believe this is inherited from DS-Chat. But I still wonder how you fix it?

Mistral as the backbone of reward model

got the error below:

Traceback (most recent call last):
  File "/mntcephfs/lab_data/kongchuyi/A-rlsimu/ReMax/step2_reward_model_finetuning/main.py", line 466, in <module>
    main()
  File "/mntcephfs/lab_data/kongchuyi/A-rlsimu/ReMax/step2_reward_model_finetuning/main.py", line 402, in main
    reward_score, acc = evaluation_reward(rm_model, eval_dataloader)
  File "/mntcephfs/lab_data/kongchuyi/A-rlsimu/ReMax/step2_reward_model_finetuning/main.py", line 344, in evaluation_reward
    outputs = model(**batch)
  File "/mntcephfs/lab_data/kongchuyi/miniconda3/envs/remaxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mntcephfs/lab_data/kongchuyi/miniconda3/envs/remaxx/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/mntcephfs/lab_data/kongchuyi/miniconda3/envs/remaxx/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1801, in forward
    loss = self.module(*inputs, **kwargs)
  File "/mntcephfs/lab_data/kongchuyi/miniconda3/envs/remaxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mntcephfs/lab_data/kongchuyi/A-rlsimu/ReMax/utils/model/reward_model.py", line 56, in forward
    transformer_outputs = self.rwtranrsformer(
  File "/mntcephfs/lab_data/kongchuyi/miniconda3/envs/remaxx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: MistralModel.forward() got an unexpected keyword argument 'head_mask'

if I revise line 51 in ReMax/utils/model/reward_model.py to:

        if self.config.model_type == "llama" or "mistral":

the script runs well.
is it reasonable?

Repoducing llama-2-7b results

Hello! Thank you for sharing your work.
I'm trying to reproduce the result with llama-2-7b model, but I got only 0.75 eval reward score after ReMax training.

When I look at the paper appendix c, it says lr as 1e-6 (but it is not the same as the other lr..), and kl penalty coeff. as 0.1.
Except them, what would I need to consider? (maybe batch size? I'm using 8*24)

And I'm currently using 8 gpus for training, do you think this would be the reason why make the result different?

liziniu / remax Goto Github PK

remax's People

Contributors

Stargazers

Watchers

Forkers

remax's Issues

Bugs when using zero-stage3

Mistral as the backbone of reward model

Repoducing llama-2-7b results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent