Code Monkey home page Code Monkey logo

attention-target-detection's Introduction

CVPR 2020 - Detecting Attended Visual Targets in Video

Overview

This repo provides PyTorch implementation of our paper: 'Detecting Attended Visual Targets in Video' [paper]

We present a state-of-the-art method for predicting attention targets from third-person point of view. The model takes head bounding box of a person of interest, and outputs an attention heatmap of that person.

We release our new dataset, training/evaluation code, a demo code, and pre-trained models for the two main experiments reported in our paper. Pleaser refer to the paper for details.

Getting Started

The code has been verified on Python 3.5 and PyTorch 0.4. We provide a conda environment.yml file which you can use to re-create the environment we used. Instructions on how to create an environment from an environment.yml file can be found here.

Download our model weights using:

sh download_models.sh

Quick Demo

You can try out our demo using the sample data included in this repo by running:

python demo.py

Experiment on the GazeFollow dataset

Dataset

We use the extended GazeFollow annotation prepared by Chong et al. ECCV 2018, which makes an additional annotation to the original GazeFollow dataset regarding whether gaze targets are within or outside the frame. You can download the extended dataset from here (image and label) or here (label only).

Please adjust the dataset path accordingly in config.py.

Evaluation

Run:

python eval_on_gazefollow.py

to get the model's performance on the GazeFollow test set.

Training

Run:

python train_on_gazefollow.py

to train the model. You can expect to see similar learning curves to ours.

Experiment on the VideoAttentionTarget dataset

Dataset

We created a new dataset, VideoAttentionTarget, with fully annotated attention targets in video for this experiment. Dataset details can be found in our paper. Download the VideoAttentionTarget dataset from here.

Please adjust the dataset path accordingly in config.py.

Evaluation

Run:

python eval_on_videoatttarget.py

to get the model's performance on the VideoAttentionTarget test set.

Training

Run:

python train_on_videoatttarget.py

to do the temporal training.

Citation

If you use our dataset and/or code, please cite

@inproceedings{Chong_2020_CVPR,
  title={Detecting Attended Visual Targets in Video},
  author={Chong, Eunji and Wang, Yongxin and Ruiz, Nataniel and Rehg, James M.},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2020}
}

If you only use the extended GazeFollow annotations, please cite

@InProceedings{Chong_2018_ECCV,
author = {Chong, Eunji and Ruiz, Nataniel and Wang, Yongxin and Zhang, Yun and Rozga, Agata and Rehg, James M.},
title = {Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

References

We make use of the PyTorch ConvLSTM implementation provided by https://github.com/kamo-naoyuki/pytorch_convolutional_rnn.

Contact

If you have any questions, please email Eunji Chong at [email protected].

attention-target-detection's People

Contributors

ejcgt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attention-target-detection's Issues

where is demo data

hello ,
show in line 21 in demo.py, run demo need data/ dir, where to download?

parser.add_argument('--image_dir', type=str, help='images', default='data/demo/frames') parser.add_argument('--head', type=str, help='head bounding boxes', default='data/demo/person1.txt')

No such file or directory: 'model_demo.pt'

I downloaded the code to my machine and tried to run it from PyCharm (with Conda python after installing torch -on Windows without GPU-) but when running the demo.py I'm getting:

No such file or directory: 'model_demo.pt'

where can I get that file from? Is it possible to run it on windows this way? and without GPU... will it work?

got error while running train_on_videoatttarget.py

python train_on_videoatttarget.py
Loading Data
Constructing model
Loading weights
/home/anaconda3/envs/myenv/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Training in progress ...
(myenv) ge@Z370-AORUS-Ultra-Gaming:~/Documents/attention-target-detection-master$

Draw the sample image

How do you draw the image in the example? After calculating the indicators, I want to draw a picture of the predicted line of sight

Only one annotation per test image and different evaluation for GazeFollow and VideoAttention dataset

Dear authors,

Thanks for sharing your code and data.

I found that:

  1. Though claimed two annotations are available on each test image, it seems that in the released annotation, we have only one annotation per-image. May I ask where we can download the full annotations on your test set?
  2. In your released code, you use different methods (to compute AUC) for Gazefollow and VideoAttention dataset. For instance, on GazeFollow you use the original annotations (10 points) to compute multi-hot vector. On your own dataset, you put a Gaussian on top of the only one annotation, set all values that greater than 0 to 1 and then use such binary map as the multi-hot vector. But in your paper, you only define AUC once. Could you please confirm whether there are two different versions of AUC used in your paper or not?

Cheers,
Yu

How are the initial weights for training obtained?

Hello,

Thanks for the great work!
From the code files for training on gazefollow and video attention target, I see the models are initialized with initial_weights_for_spatial_training.pt/initial_weights_for_temporal_training.pt. I see on your paper that for training on video attention target, you only trained the layers after the encoder, so I think initial_weights_for_temporal_training.pt are the weights after training on gazefollow, is that correct? But I see the spatial model for training on Gazefollow is also initialized with initial_weights_for_spatial_training.pt. How do you get the initial weights for this? Does it contain weights of the pretrained resnet50 for the scene/head branch?

Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.