Comments (8)
hi @manuelknott, the code for stage2 transformer is currently buggy so after I fixed everything, I will try to train and released a pretrained model. But this will be a long time later since I'm still learning about autoregressive modeling with transformers.
from enhancing-transformers.
Awesome! Already looking forward to it.
from enhancing-transformers.
Awesome! Already looking forward to it.
Then if you dont have any questions further, I will close this issue. Feel free to reopen
from enhancing-transformers.
Is there any update on this? I have seen there has been some modifications on the stage 2 code. If the bugs should be fixed, I can also try to train it on my own. Thank you!
from enhancing-transformers.
hi @manuelknott , sorry for the late reply, the issue should be fixed now and you can train a 2nd stage transformer now.
from enhancing-transformers.
Awesome, thank you. Just to be sure: the main.py
file only covers stage 1 training for now, right? Is there any chance you could share the code for training stage 2 if it is not part of the repo yet?
from enhancing-transformers.
hi @manuelknott, main.py supports training stage2 model, you just need to use correct config. For example, you can refer to imagenet_gpt_vitvq_base.yaml to have a glimpse of stage2 config. Note that this example stage2 config is quite big and can only fit in at least 8 A100s so you might want to reduce the parameters
from enhancing-transformers.
Thanks for the explanation! I will try to train a model with my limited resources (2x A5000). Let's see how it goes. Do you plan to publish a pretrained stage 2 model anytime soon?
from enhancing-transformers.
Related Issues (20)
- Pre-norm and Post-norm HOT 6
- An inplace operation in the forward process HOT 1
- Reconstruction Visualization HOT 2
- stage2 transformer HOT 2
- stage1 pretraining HOT 7
- Model license HOT 4
- stage2 transform HOT 9
- about training loss HOT 1
- The question is done. HOT 1
- ImageNet version.
- Reconstruction results HOT 4
- Learning rate and scheduler for stage 1 training. HOT 1
- why not used pretrained discriminator
- Incomplete implementation of RQ-VAE
- raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1. HOT 5
- Results HOT 5
- Smaller images HOT 10
- OOM for imagenet_gpt_vitvq_base and a 100M params GPT on A100 40G HOT 3
- Training time and number of GPUS HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enhancing-transformers.