Code Monkey home page Code Monkey logo

scanpath_prediction's Introduction

Scanpath Prediction Using Inverse Reinforcement Learning

Offical PyTorch implementation of the paper Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning (CVPR2020, oral)

We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. The viewer's internal belief states were modeled as dynamic contextual belief maps of object locations. These maps were learned by IRL and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency.

If you are using this work, please cite:

@InProceedings{Yang_2020_CVPR_predicting,
author = {Yang, Zhibo and Huang, Lihan and Chen, Yupei and Wei, Zijun and Ahn, Seoyoung and Samaras, Dimitris and Zelinsky, Gregory and and Hoai, Minh},
title = {Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Scripts

  • Train a model with
    python train.py <hparams> <dataset_root> [--cuda=<id>]
    
  • Model evaluation

Data Preparation

The dataset consists of two parts: image stimuli and fixations. For computational efficiency, we pre-compute the low- and high-resolution belief maps using the pretrained Panoptic FPN (with ResNet50 backbone) from Detectron2. For each image, we extract 134 beliefs maps for both low- and high-resolution and resize them to 20x32. Hence, for each image, we have two 134x20x32 tensors. Please refer to the paper for more details. Fixations come in the form of invidual scanpaths which mainly consists of a list of (x, y) locations in the image coordinate (see below for an example). Note that in the raw fixations there might be fixations out of the image boundaries, we remove them from the scanpaths.

The typical <dataset_root> should be structured as follows

<dataset_root>
    -- bbox_annos.npy                                # bounding box annotation for each image (available at COCO)
    -- coco_search18_fixations_TP_train.json         # train split of human scanpaths (ground-truth)
    -- coco_search18_fixations_TP_validation.json    # validation split of human scanpaths (ground-truth)
    -- ./DCBs
        -- ./HR                                      # high-resolution belief maps of each input image (pre-computed)
        -- ./LR                                      # low-resolution belief maps of each input image (pre-computed)

The .json file is a list of human scanpaths each of which is a dict object formated as follows

{
     'name': '000000400966.jpg',            # image name
     'subject': 2,                          # subject id (10 subjects from 1~10 in total)
     'task': 'microwave',                   # target name (18 target categories in total)
     'condition': 'present',                # target-present or target-absent
     'bbox': [67, 114, 78, 42],             # bounding box of the target object in the image
     'X': array([245.54666667, ...]),       # x-axis of each fixation
     'Y': array([128.03047619, ...]),       # y-axis of each fixation
     'T': array([190,  63, 180, 543]),      # duration of each fixation
     'length': 4,                           # length of the scanpath (i.e., number of fixations)
     'fixOnTarget': True,                   # if the scanpath lands on the target object
     'correct': 1,                          # 1 if the subject correctly located the target; 0 otherwise
     'split': 'train'                       # split of the image {'train', 'valid', 'test'}
 }

A sample <dataset_root> folder used and the computed belief maps in this paper can be found at this link. Note that in this paper we resaled the images to 512x320 as well as the fixation locations. The original COCO-Search18 dataset was collected on a 1680x1050 display.

COCO-Search18 Dataset

coco-search18

COCO-Search18 dataset is available at https://sites.google.com/view/cocosearch/home. COCO-Search18 is also part of the MIT/Tuebingen Saliency Benchmark.

scanpath_prediction's People

Contributors

ouyangzhibo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.