Code Monkey home page Code Monkey logo

asod60k's Introduction

Authors: Yi Zhang, Fang-Yi Chao, Ge-Peng Ji, Deng-Ping Fan, Lu Zhang, Ling Shao.

Introduction


Figure 1: Annotation examples from the proposed ASOD60K dataset. (a) Illustration of head movement (HM). The subjects wear Head-Mounted Displays (HMDs) and observe 360° scenes by moving their head to control a field-of-view (FoV) in the range of 360°×180°. (b) Each subject (i.e., Subject 1 to Subject N) watches the video without restriction. (c) The HMD-embedded eye tracker records their eye fixations. (d) According to the fixations, we provide coarse-to-fine annotations for each FoV including (e) super/sub-classes, instance-level masks and attributes (e.g., GD-Geometrical Distortion).

Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD), which mimics human attention mechanism by segmenting salient objects with the guidance of audio-visual cues. To support this task, we collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy, thus distinguishing itself with richness, diversity and quality. Specifically, each sequence is marked with both its super-/sub-class, with objects of each sub-class being further annotated with human eye fixations, bounding boxes, object-/instance-level masks, and associated attributes (e.g., geometrical distortion). These coarse-to-fine annotations enable detailed analysis for PV-SOD modeling, e.g., determining the major challenges for existing SOD models, and predicting scanpaths to study the long-term eye fixation behaviors of humans. We systematically benchmark 11 representative approaches on ASOD60K and derive several interesting findings. We hope this study could serve as a good starting point for advancing SOD research towards panoramic videos.

🏃 🏃 🏃 KEEP UPDATING.


Related Dataset Works


Figure 2: Summary of widely used salient object detection (SOD) datasets and the proposed panoramic video SOD (PV-SOD) dataset. #Img: The number of images/frames. #GT: The number of ground-truth masks. Pub. = Publication. Obj.-Level = Object-Level. Ins.-Level = Instance-Level. Fix.GT = Fixation-guided ground truths. † denotes equirectangular (ER) images.


Dataset Annotations and Attributes


Figure 3: Examples of challenging attributes on equirectangular (ER) images from our ASOD60K, with instance-level GT and fixations as annotation guidance. f(k,l,m) denote random frames of a given video.


Figure 4: More annotations. Passed and rejected examples of annotation quality control.


Figure 5: Attributes description and stastistics. (a)/(b) represent the correlation and frequency of ASOD60K’s attributes, respectively.

Dataset Statistics


Figure 6: Statistics of the proposed ASOD60K. (a) Super-/sub-category information. (b) Instance density of each sub-class. (c) Main components of ASOD60K scenes.


Benchmark

Overall Quantitative Results


Figure 7: Performance comparison of 7/3 state-of-the-art conventional I-SOD/V-SOD methods and one PI-SOD method over ASOD60K. ↑/↓ denotes a larger/smaller value is better. Best result of each column is bolded.

Attributes-Specific Quantitative Results


Figure 8: Performance comparison of 7/3/1 state-of-the-art I-SOD/V-SOD/PI-SOD methods based on each of the attributes.

Reference

No. Year Pub. Title Links
01 2019 IEEE CVPR Cascaded Partial Decoder for Fast and Accurate Salient Object Detection Paper/Project
02 2019 IEEE ICCV Stacked Cross Refinement Network for Edge-Aware Salient Object Detection Paper/Project
03 2020 AAAI F3Net: Fusion, Feedback and Focus for Salient Object Detection Paper/Project
04 2020 IEEE CVPR Multi-scale Interactive Network for Salient Object Detection Paper/Project
05 2020 IEEE CVPR Label Decoupling Framework for Salient Object Detection Paper/Project
06 2020 ECCV Highly Efficient Salient Object Detection with 100K Parameters Paper/Project
07 2020 ECCV Suppress and Balance: A Simple Gated Network for Salient Object Detection Paper/Project
08 2019 IEEE CVPR See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks Paper/Project
09 2019 IEEE ICCV Semi-Supervised Video Salient Object Detection Using Pseudo-Labels Paper/Project
10 2020 AAAI Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection Paper/Project
11 2020 IEEE SPL FANet: Features Adaptation Network for 360° Omnidirectional Salient Object Detection Paper/Project

Evaluation Toolbox

All the quantitative results were computed based on one-key Python toolbox: https://github.com/zzhanghub/eval-co-sod .


Downloads

The whole object-/instance-level ground truth with default split can be downloaded from Baidu Dirve(k3h8) or Google Drive.

The videos with default split can be downloaded from Google Drive or OneDrive.

The head movement and eye fixation data can be downloaded from Google Drive

To generate video frames, please refer to video_to_frames.py.

To get access to raw videos on YouTube, please refer to video_seq_link.

To check basic information regarding the raw videos, please refer to video_information.txt (keep updating).


Contact

Please feel free to drop an e-mail to [email protected] for questions or further discussion.

If you have any question on head movement and eye fixation data, please contact [email protected]


Citation

@article{zhang2021asod60k,
  title={ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos},
  author={Zhang, Yi and Chao, Fang-Yi and Ji, Ge-Peng and Fan, Deng-Ping and Zhang, Lu and Shao, Ling},
  journal={arXiv preprint arXiv:2107.11629},
  year={2021}
}

asod60k's People

Contributors

fannychao avatar gewelsji avatar jun-pu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.