Code Monkey home page Code Monkey logo

aiatrack's Introduction

AiATrack

The official PyTorch implementation of our ECCV 2022 paper:

AiATrack: Attention in Attention for Transformer Visual Tracking

Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, Junsong Yuan

[ECVA Open Access] [ArXiv Preprint] [YouTube Video] [Trained Models] [Raw Results] [SOTA Paper List]

Highlight

🔖Brief Introduction

Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention module (named AiA), which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking. Moreover, we propose a streamlined Transformer tracking framework (dubbed AiATrack), by introducing efficient feature reuse and target-background embeddings to make full use of temporal references. Experiments show that our tracker achieves state-of-the-art performance on several tracking benchmarks while running at a real-time speed.

🔖Strong Performance

The proposed AiATrack sets state-of-the-art results on 8 widely used benchmarks. Using ResNet-50 pre-trianed on ImageNet-1k, we can get:

Benchmark (Metrics) AiATrack Leaderboard
LaSOT (AUC / Norm P / P) 69.0 / 79.4 / 73.8 PWC
LaSOT Extension (AUC / Norm P / P) 47.7 / 55.6 / 55.4
TrackingNet (AUC / Norm P / P) 82.7 / 87.8 / 80.4 PWC
GOT-10k (AO / SR 0.75 / SR 0.5) 69.6 / 63.2 / 80.0 PWC
NfS30 (AUC) 67.9 PWC
OTB100 (AUC) 69.6 PWC
UAV123 (AUC) 70.6 PWC
VOT2020 (EAO / A / R) 0.530 / 0.764 / 0.827

🔖Inference Speed

The proposed AiATrack can run at 38 fps (frames per second) on a single NVIDIA GeForce RTX 2080 Ti.

🔖Training Cost

It takes nearly two days to train our model on 8 NVIDIA GeForce RTX 2080 Ti (each of which has 11GB GPU memory).

🔖Model Complexity

The proposed AiATrack has 15.79M (million) model parameters.

Release

Trained Models (containing the model we trained on four datasets and the model we trained on GOT-10k only) [download zip file]

Raw Results (containing raw tracking results on the datasets we benchmarked in the paper) [download zip file]

Download and unzip these two zip files under AiATrack project path, then both of them can be directly used by our code.

Let's Get Started

  • Environment

    Our experiments are conducted with Ubuntu 18.04 and CUDA 10.1.

  • Preparation

    • Clone our repository to your local project directory.

    • Download the training datasets (LaSOT, TrackingNet, GOT-10k, COCO2017) and testing datasets (NfS, OTB, UAV123) to your disk, the organized directory should look like:

      --LaSOT/
      	|--airplane
      	|...
      	|--zebra
      --TrackingNet/
      	|--TRAIN_0
      	|...
      	|--TEST
      --GOT10k/
      	|--test
      	|--train
      	|--val
      --COCO/
      	|--annotations
      	|--images
      --NFS30/
      	|--anno
      	|--sequences
      --OTB100/
      	|--Basketball
      	|...
      	|--Woman
      --UAV123/
      	|--anno
      	|--data_seq
      
    • Edit the PATH in lib/test/evaluation/local.py and lib/train/adim/local.py to the proper absolute path.

  • Installation

    We use conda to manage the environment.

    conda create --name aiatrack python=3.6
    conda activate aiatrack
    sudo apt-get install ninja-build
    sudo apt-get install libturbojpeg
    bash install.sh
    

    Note that your PyTorch version must be pytorch <= 1.10.1 to successfully compile PreciseRoIPooling since <THC/THC.h> has been removed in pytorch 1.11.

  • Training

    • Multiple GPU training by DDP (suppose you have 8 GPU)

      python tracking/train.py --mode multiple --nproc 8
      
    • Single GPU debugging (too slow, not recommended for training)

      python tracking/train.py
      
    • For GOT-10k evaluation, remember to set --config baseline_got.

  • Evaluation

    • Make sure you have prepared the trained model.

    • On large-scale benchmarks:

      • LaSOT

        python tracking/test.py --dataset lasot
        python tracking/test.py --dataset lasot_ext
        

        Then evaluate the raw results using the official MATLAB toolkit.

      • TrackingNet

        python tracking/test.py --dataset trackingnet
        python lib/test/utils/transform_trackingnet.py --tracker_name aiatrack --cfg_name baseline
        

        Then upload test/tracking_results/aiatrack/baseline/trackingnet_submit.zip to the online evaluation server.

      • GOT-10k

        python tracking/test.py --param baseline_got --dataset got10k_test
        python lib/test/utils/transform_got10k.py --tracker_name aiatrack --cfg_name baseline_got
        

        Then upload test/tracking_results/aiatrack/baseline_got/got10k_submit.zip to the online evaluation server.

    • On small-scale benchmarks:

      • NfS30, OTB100, UAV123

        python tracking/test.py --dataset nfs
        python tracking/test.py --dataset otb
        python tracking/test.py --dataset uav
        python tracking/analysis_results.py
        

        As previous works did, the frames where the target object doesn't exist will be excluded during the analysis.

    • For multiple threads inference, just add --threads 40 after tracking/test.py (suppose you want to use 40 threads in total).

    • To show the immediate prediction results during inference, modify settings.show_result = True in lib/test/evaluation/local.py (may have bugs if you try this on a remote sever).

    • Please refer to STARK+Alpha-Refine for VOT integration and DETR Tutorial for correlation map visualization.

Acknowledgement

❤️❤️❤️Our idea is implemented base on the following projects. We really appreciate their wonderful open-source works!

Citation

If any parts of our paper and code help your research, please consider citing us and giving a star to our repository.

@inproceedings{gao2022aiatrack,
  title={AiATrack: Attention in Attention for Transformer Visual Tracking},
  author={Gao, Shenyuan and Zhou, Chunluan and Ma, Chao and Wang, Xinggang and Yuan, Junsong},
  booktitle={European Conference on Computer Vision},
  pages={146--164},
  year={2022},
  organization={Springer}
}

Contact

If you have any questions or concerns, feel free to open issues or directly contact me through the ways on my GitHub homepage. Suggestions and collaborations are also highly welcome!

aiatrack's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

aiatrack's Issues

epoch numbers

Hi, excuse me, is the main reason the training rounds are 500 because of the learnable target background embedding?I found out got10k also trained 500 epochs.Sorry to bother you. Thank you.

NFS data set

Hello, it is a great honor to see your excellent work! I would like to ask you if you have any questions about the NFS data set downloaded from the NFS Data set official website and how to set the path of the NFS data set.

There are no anno and sequences files in the original dataset, and I got the following ValueError when I ran it: could not convert string '0 520 260 564 369 1 0 0 1 "person"' to float64 at row 0, column 1.

I wonder if you have met it before. Looking forward to your reply, thank you

HOW TIME

一张24G的GPU需要训练多长时间

单张2080ti训练时间

你好,用2080ti在单个got10k数据集上训练,12个小时只训练了5轮正常吗?

Training Logs

Are there training logs for the models?

Thank you.

multigpu train

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256]] is at version 7; expected version 6 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Are you experiencing this problem?

DEMO

It would be so useful if we can test the tracker with our own video. Can you please share the demo script if existed ?

Stuck in "loading extension module _prroi_pooling"

Thank you for your excellent work, it really do teach me a lot . but i meet a problem when i training aiatrack-got on two rtx 2080ti,it stuck in "loading extension module _prroi_pooling" and no more forward ,i have ever trained other project which also have prpooling , but i dont't meet this problem . so i want to know how to Adjust the training parameters?

Evaluate dataset

I have using evaluation server you provided, but in the trackingnet, I can't submit, and got10k didn't response me. And can you give me way to evaluate in VOT2020 dataset? Thank you

congratulations!

恭喜恭喜🎉
有一个问题想咨询一下,我在跑got10k和trackingnet的时候都很顺利,结果也进行了测试,但是在跑otb的时候fps显示都是-1,请问这个是异常还是怎么回事
谢谢啦😁

multiple result file for UAV123 results

Hi,

I have downloaded the raw results. I am interested in UAV123 dataset. I wonder the multiple file results as follows:

you have multiple files for some sequences such as:
uav_bird1 (uav_bird1_1 + uav_bird1_2 + uav_bird1_3)
uav_car1 (uav_car1_1 + uav_car1_2 + uav_car1_3 + uav_car1_s)
..
..

My questions are:

Should we combine them to a file?
Your tracker lost the target and have you restarted it again in these cases?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.