Comments (8)
@kmehant while the fix is as you described, now that #53 is merged, I think it may be best to switch to accelerate, which uses a yaml config defaults file. With yaml explicit encasement of strings using "
and '
are not necessary, and it will be more robust to such issues. I suggest the following changes
- update the README.md removing instructions for
torch.run
and replace withaccelerate.launch
- replace the FSDP JSON with a config yaml like this.
- BTW I suggest to move
fsdp_config.json
out oftuning/config
(which houses code) into somewhere which only houses config fixtures.
- BTW I suggest to move
from fms-hf-tuning.
Thanks Fabian, created issue for README updates #87 . We will prioritize it at earliest
from fms-hf-tuning.
I think it may be best to switch to accelerate, which uses a yaml config defaults file.
Thanks @fabianlim, I am aware of this, isn't accelerate a wrapper over torch.distributed?
I suggest the following changes
I guess @Ssukriti is tracking them in a different issue #87
from fms-hf-tuning.
@kmehant I was planning to get to issue #87 in next 2 days as its high priority for our deliverables, but if you are interested and want to contribute instead, feel free to do so. Just let me know so I can plan accordingly :) .
We do need it completed at earliest so we can also start some testing with multi-GPU on our end as well
from fms-hf-tuning.
@Ssukriti I will be glad to raise a PR in a couple of hours.
from fms-hf-tuning.
@kmehant its up to you but I should be able to get to #87 pretty soon.
from fms-hf-tuning.
@Ssukriti @fabianlim I have raised a PR here #92 Thanks.
from fms-hf-tuning.
@Ssukriti @fabianlim I have raised a PR here #92 Thanks.
@kmehant ok looks like we duplicated work, see #91
from fms-hf-tuning.
Related Issues (20)
- Prompt Tuning returns low-quality results HOT 33
- Test SFTtrainer image HOT 1
- feat: standardize the format for metrics, operations, controls in the yamls used by `TrainerControllerCallback`
- bug: AIM package being installed causes the trainer to expect the AIM server to be running. HOT 1
- feat: Expose the trainer state as a trainer controller metric
- feat: Exposed the evaluation metrics for rules within trainer controller
- bug: build output and auto-generated file are not ignored
- feat: support for robust benchmarking of fms-hf-tuning HOT 8
- Contribute ADR for Acceleration Framework Idea
- bug: `eval` is still not safe even with checks for `__` and `"__builtins__": None`
- bug: logging_steps greater than one results in TypeError when evaluating trainer controller rule HOT 2
- Add unit tests for tuning/sft_trainer.py HOT 1
- Add unit tests to tuning/utils/config_utils.py HOT 4
- Add unit tests for tuning/utils/merge_model_utils.py HOT 4
- Document the linting process.
- bug: Using more than 1 GPU causes random stalls and exceptions HOT 9
- Switch to accelerate for Multi GPU HOT 3
- Update launch training for multi GPU training
- Wrong repo - deleted
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fms-hf-tuning.