Code Monkey home page Code Monkey logo

adaptive-attention-in-cv's Introduction

Adaptive Attention Span in Computer Vision

Official implementation of Adaptive Attention Span in Computer Vision.

In this work we first try replicating results from Stand-Alone Self-Attention in Vision Models.

Next we propose a novel method based on the Adaptive Attention Span for learning a local self attention kernel size. We compare this with Local Attention kernels as well as convolution kernels on CIFAR100. Our codes for Adaptive Attention Span in 2D is originally inspired from FAIR's implementation. Code for self-attention in convolutions is loosely based on this repo by leaderj1001.

Steps to replicate

  1. Clone this repository
  2. Get the requirements pip install -r requirements.txt

Execution notes:

  • Our Adaptive implementation takes 3, 6 and 11 hours for small, medium and large models respectively on 2 P100 GPUs for 100 epochs on CIFAR100.
  • Some important flags are,
    • To run on GPU, use the flag --cuda True, otherwise do not use this option.
    • Use flags --smallest_version True to run the smallest version. --small_version True to run the medium model and no flags to use the large model
    • A description of each of the small, medium and large is given in Appendix A.3 of our paper
  • For more details on other flags, see the file config.py which has descriptions for each.

Snippets

Best performing medium adaptive attention span model on CIFAR100:

python main.py --all_attention True --eta_min 0 --warmup_epochs 10 \
--lr 0.05 --batch-size 50 --small_version True --cuda True \
--num-workers 2 --xpid best_adaptive_medium --groups 4 \
--attention_kernel 5 --epochs 100 --dataset CIFAR100 --weight-decay 0.0005 \
--adaptive_span True --R 2 --span_penalty 0.01

Best performing medium local attention model on CIFAR100:

python main.py --all_attention True --eta_min 0 --warmup_epochs 10 \
--lr 0.05 --batch-size 50 --small_version True --cuda True \
--num-workers 2 --xpid best_local_medium --groups 4 \
--attention_kernel 5 --epochs 100  --dataset CIFAR100 --weight-decay 0.0005

Best performing medium CNN model on CIFAR100:

python main.py --eta_min 0 --warmup_epochs 10 --lr 0.2 --batch-size 50 \
--small_version True --cuda True --num-workers 2 --T_max 100 --xpid best_cnn_medium \
--dataset CIFAR100 --force_cosine_annealing True --weight-decay 0.0001

Reference

If you find this repository useful, do cite it with

@misc{parker2020adaptive,
    title={Adaptive Attention Span in Computer Vision},
    author={Jerrod Parker and Shakti Kumar and Joe Roussy},
    year={2020},
    eprint={2004.08708},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

adaptive-attention-in-cv's People

Contributors

jerrodparker20 avatar joeroussy avatar shaktikshri avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.