Code Monkey home page Code Monkey logo

Comments (3)

ahatamiz avatar ahatamiz commented on May 26, 2024

Hi @rwightman

Thank you for the insightful comments/questions. For starters, our work is built on top of timm==0.5.4 (default settings, etc.). In addition, for all experiments, we used 4 nodes ( 4 x 8 V100 = 32 GPUs). I'd like to provide more details/answers regarding the questions:

  • For GC ViT Tiny, the uploaded model weights/logs use a total batch size of 32 x 128 (N_gpus * batch_size_per_gpu) = 4096. However, we have also trained with a total batch size of 32 x 32 = 1024 and achieved very similar results (please see the table below). When using a local batch size of 128 (total 4096), we use a learning rate of 0.005 (as specified here). Otherwise, we use a learning rate of 0.001 when local batch size is 32 (total 1028).
  • The global batch size is 32 x 128 = 4096 when the local batch size is 128. We used 32 GPUs as specified above.
  • We did not run into any sensitivities, epsilons, etc. at and used all the defaults from timm==0.5.4. In fact, I have actually uploaded the entire config file, as generated by timm, in this link for a through overview of all hyper-parameters.
  • Yes. We achieve slight improvement using EMA, and generally find EMA to be more useful. Results for experiments with and without EMA are listed below.
  • We actually did use AMP for all experiments, as indicated in the config file. But for clarity, I have also added --amp to the training commands.
model top-1 local batch size global batch size EMA AMP
GCViT-T 83.40 128 4096 Yes Yes
GCViT-T 83.38 128 4096 No Yes
GCViT-T 83.39 32 1024 Yes Yes
GCViT-T 83.37 32 1024 No Yes

In addition to the above, we have also used the Swin Transformer epoch-based scheduler by slighly modifying the timm's iteration based scheduler (link here). Our motivation was to be comparable with Swin training settings. We will update the arXiv manuscript to reflect these information very soon.

Given my previous experience, I believe that timm library is the most effective and efficient way for ImageNet training, and an easy way to reaching SOTA or surpassing it, without needing to change much.

from gcvit.

rwightman avatar rwightman commented on May 26, 2024

@ahatamiz thank you for the detailed response, my LR needs a bit of adjustment based on that info, I'll try another run with that and a new seed, I noticed the scheduler change, for long training runs I've found it made very little difference (why I have been slow to support per-step update)....

from gcvit.

ahatamiz avatar ahatamiz commented on May 26, 2024

Hi @rwightman

Sure. I totally agree that scheduler would not make a big difference. Looking forward to know the results, and would be happy to provide more details if needed.

Thanks.

from gcvit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.