Code Monkey home page Code Monkey logo

cmt-pytorch's Issues

Results on ImageNet

Hi,关于结果的一些训练方法,除了CMT-T(600e-1000e)和CMT-XS(350e-400e)要高于300e,当时CMT-T为了对标EfficientNet的训练,300epoch达不到79,600e以上才能到78以上,大概参数如下,其他的应该和论文差不多,比如一模一样的FLOPs的话,R=3.8其实是R=3.77这种,感觉无关紧要,就不贴在issue里了,投稿体验极差,本来想放代码的,也拖着了==,希望这些参数对你有帮助。

CMT-Tiny (600e-1000e is better) Top-1: 79.2
python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_tiny --batch-size 256 --apex-amp --input-size 160 --weight-decay 0.04 --drop-path 0.1 --epochs 1000 --warmup-lr 1e-7 --warmup-epochs 20 --lr 8e-4 --min-lr 2e-5 --no-model-ema

CMT-XS (350e-400e is better) Top-1: 81.8
python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_extra_small --batch-size 128 --apex-amp --input-size 192 --weight-decay 0.08 --drop-path 0.1 --epochs 400 --warmup-epochs 20 --lr 7e-4 --min-lr 2e-5 --model-ema-decay 0.9998

CMT-Small (300e) Top-1: 83.5
python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_small --batch-size 128 --apex-amp --input-size 224 --weight-decay 0.05 --drop-path 0.1 --epochs 300 --model-ema-decay 0.99996

CMT-Base (300e, FC Drop=0.3) Top-1: 84.5
python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_base --batch-size 128 --apex-amp --input-size 256 --weight-decay 0.05 --drop-path 0.25 --epochs 300 --min-lr 2e-5 --model-ema-decay 0.99996

CMT-Large (300e, FC Drop=0.3) Top-1: 84.8
python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_large --batch-size 128 --apex-amp --input-size 288 --weight-decay 0.05 --drop-path 0.4 --epochs 300 --model-ema-decay 0.99996

issues on current architecture

  • 根据论文的描述以及图示中,与当前实现不同的地方:

    • LightMultiHeadSelfAttentionself.sr = nn.Conv2d(...) 应该是 DW Conv。 这里使用 DW 的话,总体的参数量应该可以接近论文中的描述。
    • InvertedResidualFeedForward 中 DW 部分应该类似 F(X) = Norm(GELU(DWConv(X) + X)),当前的实现类似 F(X) = Norm(GELU(DWConv(X))) + X
    • 论文中的描述,每个 stage 降采样的 Conv2D 后面有一个 layer_norm
  • 另外不确定的地方:

    • 接触到的很多网络模型,一般 conv2d 中不使用 bias,不知道作者这里是用的什么。@ggjy
  • 我大概写了 Tensorflow 的实现 Keras CMTRelativePostional 部分还没写。有时间也训练一下试试,最近写的几个模型 Halonet / CoAtNet 什么的训练都占用显存很大,不好跑啊。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.