Comments (1)
@kwonmha - I thought it's just a not important choice, like someone perfer loguru
, and another one perfer native logging
. And if using peft package from HF, there're some trouble, like peft need a regex to match module name with less readable code or list the all full name that lose the universality. The second trouble is the gradient checkpoint, cause peft will warp the module which make the model lose the function like gradient_checkout_enable
, then you must enable it once the base transformer model init that makes transformer model and v_head separated. DS team may perfer a model without any warpper, Just like it said in the comments of deepspeed initialize.
from deepspeedexamples.
Related Issues (20)
- [BUG] DeepSpeed-Chat Step3 - actor model repeats generating the same token when hybrid engine enabled HOT 6
- Should it use global_rank as the condition for shared-disk?
- [Step2 RewardModel] Why use the last token as the reward of sentence ? HOT 1
- Something wrong at step1_supervised_finetuning/main.py
- deeepspeed chat 支持pipline 并行吗?
- 运行e2e_rlhf时报错
- [Discussion] Can anyone show the performance on every step with any dataset
- Mistral and Orca Training
- async_pipeline is not exposed in the library HOT 1
- Step3 PPO print error when enable --print_answers HOT 1
- Invalidate trace cache @ step 0: expected module 0, but got module 6
- Step3 hanging for a long time HOT 1
- torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, remote process exited or there was a network error, NCCL version 2.18.6 HOT 3
- running gpt2-xl/test_tune.sh fails - ParquetConfig.__init__() got an unexpected keyword argument 'token'
- Question: Why not padding to the same sequence length within the batch during the sft training phase?
- How to resume Deepspeed-Chat RLHF step-3 training?
- remove redundant code
- The inaccurate flop results after several rounds HOT 1
- Throughput should be `num_queries/latency` as opposed to `num_clients/latency`?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeedexamples.