Comments (3)
Hi @rwightman
Thank you for the insightful comments/questions. For starters, our work is built on top of timm==0.5.4
(default settings, etc.). In addition, for all experiments, we used 4 nodes ( 4 x 8 V100 = 32 GPUs). I'd like to provide more details/answers regarding the questions:
- For GC ViT Tiny, the uploaded model weights/logs use a total batch size of 32 x 128 (N_gpus * batch_size_per_gpu) = 4096. However, we have also trained with a total batch size of 32 x 32 = 1024 and achieved very similar results (please see the table below). When using a local batch size of 128 (total 4096), we use a learning rate of 0.005 (as specified here). Otherwise, we use a learning rate of 0.001 when local batch size is 32 (total 1028).
- The global batch size is 32 x 128 = 4096 when the local batch size is 128. We used 32 GPUs as specified above.
- We did not run into any sensitivities, epsilons, etc. at and used all the defaults from
timm==0.5.4
. In fact, I have actually uploaded the entire config file, as generated by timm, in this link for a through overview of all hyper-parameters. - Yes. We achieve slight improvement using EMA, and generally find EMA to be more useful. Results for experiments with and without EMA are listed below.
- We actually did use AMP for all experiments, as indicated in the config file. But for clarity, I have also added
--amp
to the training commands.
model | top-1 | local batch size | global batch size | EMA | AMP |
---|---|---|---|---|---|
GCViT-T | 83.40 | 128 | 4096 | Yes | Yes |
GCViT-T | 83.38 | 128 | 4096 | No | Yes |
GCViT-T | 83.39 | 32 | 1024 | Yes | Yes |
GCViT-T | 83.37 | 32 | 1024 | No | Yes |
In addition to the above, we have also used the Swin Transformer epoch-based scheduler by slighly modifying the timm's iteration based scheduler (link here). Our motivation was to be comparable with Swin training settings. We will update the arXiv manuscript to reflect these information very soon.
Given my previous experience, I believe that timm library is the most effective and efficient way for ImageNet training, and an easy way to reaching SOTA or surpassing it, without needing to change much.
from gcvit.
@ahatamiz thank you for the detailed response, my LR needs a bit of adjustment based on that info, I'll try another run with that and a new seed, I noticed the scheduler change, for long training runs I've found it made very little difference (why I have been slow to support per-step update)....
from gcvit.
Hi @rwightman
Sure. I totally agree that scheduler would not make a big difference. Looking forward to know the results, and would be happy to provide more details if needed.
Thanks.
from gcvit.
Related Issues (20)
- Global query: wrong input transformation HOT 1
- training is slow HOT 5
- is there a large release, please ? HOT 1
- image input for channel changing HOT 3
- Reliable comparison with Swin: fixed window size HOT 1
- x = x.reshape(B, 1, self.N, self.num_heads, self.dim_head).permute(0, 1, 3, 2, 4) reshape error!!!! HOT 3
- New third part implementation HOT 2
- Using timm gcvit models and getting errors when using resolution other than 224x224 HOT 2
- Backbone Feature Maps for One-Stage Detectors HOT 5
- Validate using public checkpoint, Accuracy extremely low HOT 4
- Is there any documentation to train a semantic segmentation model using a custom dataset? HOT 3
- Hi, Where can I download GC_ViT-B_384 pretrained weight ? HOT 6
- Third-Party JAX/Flax implementation HOT 2
- How to use "--input-size" properly? HOT 2
- Getting an error for the output shape for each stage in GC ViT
- Getting an error for the output shape for each stage in GC ViT HOT 1
- Shape errors in data loading when using train.py HOT 1
- Could anyone help me with the data loading process? HOT 1
- How to Visualize the global attention, as shown in Figure 1? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcvit.