Comments (3)
git-theta
git extension for collaborative, continual, and communal model development.
How to use this repository?
LFS installation
Download the LFS package from the website. For Linux users, download the amd64 version from the list of assests in the website.
Getting started
clone the repository
git clone https://github.com/r-three/git-theta.git
Install the packages by running:
cd git-theta
pip install -e .
Installing git theta
You can initialize git theta
in the root directory of the codebase to track code and models as follows:
git theta install
The following lines will be added to the .gitconfig
file in the home directory of the user after the successful installation.
[filter "lfs"]
smudge = git-lfs smudge -- %f
required = true
clean = git-lfs clean -- %f
[filter "theta"]
clean = git-theta-filter clean %f
smudge = git-theta-filter smudge %f
required = true
Example Usage
First, initialize git in the root directory of the codebase
git init
In order to start tracking the model using git theta
, run this command
git theta track {path_to_model_checkpoint}
The above command adds the following lines to the .gitattributes
files in the home directory.
".git_theta/{path_to_model_checkpoint}/**/params/[0-9]*" filter=lfs diff=lfs merge=lfs -text
{path_to_model_checkpoint} filter=theta
Once tracked, stage any changes made to the model by running the command
git theta add {path_to_model_checkpoint}
This creates a .git_theta/{path_to_model_checkpoint}
folder in the root directory of the codebase.
This will store the parameters of the model in the tensorstore format inside the .git_theta/{path_to_model_checkpoint}
folder. For example, consider a parameter name decoder.block.0.layer.0.SelfAttention.k.weight
in the model checkpoint with path pytorch_model.bin
, the corresponding parameters are stored as the following hierarchy .git_theta/pytorch_model.bin/decoder.block.0.layer.0.SelfAttention.k.weight
.
At this step, run git status
, you should see all the .git_theta/{path_to_model_checkpoint}/{parameter_name}
files in "Changes to be committed" along with the model checkpoint file and the .gitattributes
file.
After adding the model checkpoint, add any other code/text files that are modified using git add
. You can then commit the changes and push to remote.
The remote will have the .git_theta/{path_to_model_checkpoint}
folder in it where instead of the actual params, git remote shows the params are stored as LFS objects. A metadata file describing the contents of the params like shape, dtype, and hash are stored inside .git_theta/{path_to_model_checkpoint}/{parameter_name}
on git remote. The actual model checkpoint path as seen on the remote will be a file containing the hash, shape and type of each of the keys in the checkpoint .
TBA
git diff
on the model checkpoint will identify which parameter groups are modified or added or removed.
git merge
will assume that all merges to the checkpoint (i.e. to parameter group files) result in merge conflicts and offer various possible automated merging strategies that can be tried and vetted.
git checkout
to a commit will construct a checkpoint based on the contents of .git_theta/<model_checkpoint_name>
at that commit.
from git-theta.
Make a pull request please (after updating the name).
from git-theta.
Outline
- Brief overview of what git-theta is
- How and why it's different from treating the checkpoint as a blob of data and what it supports with lots of links
- Somewhat comprehensive usage example
- Use the example from the paper
- Specific usage examples
- Parameter-efficient updates
- Enumerate the different ways that we support this
- Performing a merge
- Trying out a different version of a model on a branch
- Parameter-efficient updates
- Extending git-theta
- Adding update types (talk to Muqeeth)
- Adding merge methods
- Adding checkpoint formats
- Why do I need git-theta (how the internals work)
- Why not git-lfs
- How we handle parameter-efficient updates
- How we handle merging
- How we do hashing
from git-theta.
Related Issues (20)
- Add an "apply to all" option to merge actions
- Parameter groups that are more than just tensors? HOT 3
- Add a way to script merges
- Functionality for partial model loading HOT 3
- Method to tell if git-theta wasn't installed? HOT 4
- Pytorch Checkpoint reading
- Git Add can have high memory usage.
- Finer-grained control of `git theta install` HOT 1
- Tensorflow model loading/saving seems bugged
- `git theta ls-files` HOT 1
- Git-Theta Clean
- Hanging when crashing
- More intelligent concurrency limits
- Investigate using cffi to speed up git lfs interface
- Configurable Serialization, Combining, and Saving to a backend
- Add `__str__` to metadata object HOT 1
- Update CI to handle MacOS
- Add retry to end2end tests
- in the `clean` filter, auto-detect checkpoint handler based on file extension HOT 1
- [end2ends] push repos to Hugging Face Hub (and git clone from there) to ensure it works HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from git-theta.