Comments (2)
@nairbv @thoangtrvn @JRosenkranz
from fms-fsdp.
Stateless Implementation
Although the LCG provides the desired random permutation, this approach introduces extra state to be tracked (our position in the recursively-generated permutation sequence, and/or our position in the shard file). A much cleaner implementation is to use the LCG as a stateless, randomized bijective map from a contiguous range of doc indices to a shuffled, noncontiguous range of doc indices.
We can do this by leveraging the fact that the state of the LCG above is always set to the last emitted value. Since the LCG emits every value in the desired range exactly once per cycle, each seeded by the previous, we can instead simply re-seed the LCG every time with a position index argument, and it will hash that index to a new position with guaranteed no collisions. So at runtime we can simply iterate sequentially through the range of documents in a file shard owned by a given worker (possibly a subset of the full shard), and LCG will provide a map to a new, shuffled, noncontiguous set of documents.
Pros: Introduces no extra state to track, avoids materializing any long shuffled lists of position indices. Allows workers to now perform non-contiguous partitioning of shard files, in cases where files are split over multiple workers.
Cons: Produces similar shuffles across different shard files. The algorithm for finding the bijective mapping provided by LCG for a given index is: "Take the length-m
cycle of indices produced by the given choice of m
(2^16+1, 2^23, 2^32 above), find the given index, and proceed through the cycle until you land on a new index below your size threshold". This means that two shard files with the same number of documents will receive the same mapping, since they are stepping through the same cycle, the same way. Furthermore, two shard files of size m1, m2 with m2>m1 will also have the same mapping, up to insertion of the new indices greater than m1 and smaller than m2. Thus our LCG mapping is clearly less random than the original shuffled doc list implementation.
from fms-fsdp.
Related Issues (20)
- Faulty type handling for 'weight' kwarg HOT 6
- optimize profiler trace generation
- add 1.4B config HOT 1
- add wandb HOT 1
- revert "raise Dynamo accumulated cache size limit"
- make selective ac more flexible. HOT 9
- add Rank0-only profiler
- make fms-to-hf support for "compiled" model
- maximize mistral throughput HOT 2
- A write-up on Meta Device Init x Pretraining
- [speculator training] Support for loading different HF checkpoints for speculator training HOT 1
- Unable to Replicate MFU for 7B on 80gb A100 HOT 3
- Not Able to Reproduce Multi-Node Throughput for 7B Model on 8 Node H100 Cluster HOT 3
- The model conversion to hf is broken with the latest Fused GatedLinearUnit Support in ibm-fms 0.0.6 HOT 2
- Question on 7B H100 MFU HOT 2
- Support nested folders for datasets HOT 1
- Repeatability of Small Model Training Script with fixed seed(s) and same dataset HOT 1
- The default model variant is 7b but it is not supported. HOT 2
- FMS-FSDP running on A100 8GPU machine failed with NCCL error messages
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fms-fsdp.