Comments (6)
Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen
Hi @thuanz123 , thanks for sharing the checkpoint of vit-vqgan small. Then for this small-small model, how many GPUs and how many iterations you have trained?
For quick training, I use 32 gpus A100 40GB and train for 500000 iterations on ImageNet. But I think a decent GPU with 8GB VRAM is enough, just lower the batch size and train longer
from enhancing-transformers.
Hi @JialeTao, depend on the config, training can be fast or slow. For the config in this repo which is ViT-VGAN base, it takes about 1.45s per iteration with a batch size of 4 on a A100, this is quite demanding. So if you dont have much gpus, I recommend training a much smaller config than the config I have in this repo. Also, there are plans to train smaller models so you can wait if you want but it will be a long time later since I'm too busy these days 😭
from enhancing-transformers.
Thanks for the reply. Then for the vit-vqgan base, what how many iterations you have trained? And the 1.45s means stage 1 training or stage 2 training? And the last, A100 with 40G menmory or 80G?
from enhancing-transformers.
Hi @JialeTao, 1,45s per iteration is for stage 1 training and the GPU is A100 40GB. I have trained vit-vqgan base for 1000000 iterations with 32 A100s and each gpu has batch size of 4. For stage 2 training, it is currently buggy so I dont have any estimate or numbers for it 😅
from enhancing-transformers.
Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen
from enhancing-transformers.
Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen
Hi @thuanz123 , thanks for sharing the checkpoint of vit-vqgan small. Then for this small-small model, how many GPUs and how many iterations you have trained?
from enhancing-transformers.
Related Issues (20)
- Pre-norm and Post-norm HOT 6
- An inplace operation in the forward process HOT 1
- Reconstruction Visualization HOT 2
- stage2 transformer HOT 2
- stage1 pretraining HOT 7
- Model license HOT 4
- stage2 transform HOT 9
- about training loss HOT 1
- The question is done. HOT 1
- ImageNet version.
- Reconstruction results HOT 4
- Learning rate and scheduler for stage 1 training. HOT 1
- why not used pretrained discriminator
- Incomplete implementation of RQ-VAE
- raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1. HOT 5
- Results HOT 5
- Smaller images HOT 10
- OOM for imagenet_gpt_vitvq_base and a 100M params GPT on A100 40G HOT 3
- Pretrained Stage 2 Transformer for ViT-VQGAN HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enhancing-transformers.