Code Monkey home page Code Monkey logo

depth-vrd's Introduction

1 Ludwig Maximilian University, Munich, Germany, 2 Sapienza University of Rome, Italy
3 Siemens AG, Munich, Germany

Abstract

Visual relation detection methods rely on object information extracted from RGB images such as 2D bounding boxes, feature maps, and predicted class probabilities. We argue that depth maps can additionally provide valuable information on object relations, e.g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding. In this work, we study the effect of using different object features with a focus on depth maps. To enable this study, we release a new synthetic dataset of depth maps, VG-Depth, as an extension to Visual Genome (VG). We also note that given the highly imbalanced distribution of relations in VG, typical evaluation metrics for visual relation detection cannot reveal improvements of under-represented relations. To address this problem, we propose using an additional metric, calling it Macro Recall@K, and demonstrate its remarkable performance on VG. Finally, our experiments confirm that by effective utilization of depth maps within a simple, yet competitive framework, the performance of visual relation detection can be improved by a margin of up to 8%.

Model

Highlights

  • We perform an extensive study on the effect of using different sources of object information in visual relation detection. We show in our empirical evaluations using the VG dataset, that our model can outperform competing methods by a margin of up to 8% points, even those using external language sources or contextualization.

  • We release a new synthetic dataset VG-Depth, to compensate for the lack of depth maps in Visual Genome.

  • We propose Macro Recall@K as a competitive metric for evaluating the visual relation detection performance in highly imbalanced datasets such as Visual Genome.

VG-Depth Dataset

We release a new dataset called VG-Depth as an extension to Visual Genome. This dataset contains synthetically generated depth maps from Visual Genome images and can be downloaded from the following link: VG-Depth.
Please visit VisualGenome-to-Depth repository if you would like to generate the depth images yourself.

Here are some examples of the Visual Genome images and their corresponding depth maps provided in our dataset:

Results

Qualitative Results

Some of the qualitative results from our model’s predictions. Green arrows indicate the successfully detected predicates (true positives), orange arrows indicate the false negatives and gray arrows indicate predicted links which are not annotated in the ground truth.

Code

Requirements

The main requirements of our code are as follows:

  • Python >= 3.6 (for Python 3.7 please check "Python3.7-beta" branch)
  • PyTorch >= 1.1
  • TorchVision >= 0.2.0
  • TensorboardX
  • CUDA Toolkit 10.0
  • Pandas
  • Overrides
  • Gdown

Setup

Please make sure you have Cuda Toolkit 10.0 installed, then you can setup the environment by calling the following script:

./setup_env.sh

This script will perform the following operations:

  1. Install the required libraries.
  2. Download Visual-Genome.
  3. Download the Depth version of VG.
  4. Download the necessary checkpoints.
  5. Compile the CUDA libraries.
  6. Prepare the environment.

If you have already installed some of the libraries or downloaded the datasets, you can run the following scripts individually:

./data/fetch_dataset.sh             # Download Visual-Genome
./data/fetch_depth_1024.sh          # Download VG-Depth
./checkpoints/fetch_checkpoints.sh  # Download the Checkpoints

How to Run

Set the dataset path in the config.py file, and adjust your PYTHONPATH (e.g. export PYTHONPATH=/home/sina/Depth-VRD).

To train or evaluate the networks, select the configuration index and run the following scripts: (e.g. to evaluate "Ours-v,d" model run ./scripts/shz_models/eval_scripts.sh 7)

  • To train the depth model separately (without the fusion layer):
./scripts/shz_models/train_depth.sh <configuration index>
  • To train each feature separately (e.g. class features):
./scripts/shz_models/train_individual.sh <configuration index>
  • To train the fusion models:
./scripts/shz_models/train_fusion.sh <configuration index>
  • To evaluate the models:
./scripts/shz_models/eval_scripts.sh <configuration index>

The training scripts will be performed for eight different random seeds and 25 epochs each.

Checkpoints

You can also separately download the full model (LCVD) here. Please note that this checkpoint generates results that are slightly different that the ones reported in our paper. The reason is that for a fair comparison, we have reported the mean values from evaluations over several models.

Contributors

This repository is created and maintained by Sahand Sharifzadeh, Sina Moayed Baharlou, and Max Berrendorf.

Acknowledgements

The skeleton of our code is built on top of the nicely organized Neural-Motifs framework (incl. the rgb data loading pipeline and part of the evaluation code). We have upgraded these parts to be compatible with PyTorch 1. To enable a fair comparison of models, our object detection backbone also uses the same Faster-RCNN weights as that work.

Bibtex

@INPROCEEDINGS{9412945,
  author={Sharifzadeh, Sahand and Baharlou, Sina Moayed and Berrendorf, Max and Koner, Rajat and Tresp, Volker},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  title={Improving Visual Relation Detection using Depth Maps},
  year={2021},
  volume={},
  number={},
  pages={3597-3604},
  doi={10.1109/ICPR48806.2021.9412945}
}

depth-vrd's People

Contributors

sharifza avatar sina-baharlou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

depth-vrd's Issues

how to visualize like the paper did

Cannot find the visualize code in this repository, can you upload the visualize code which create the picture in the paper? thank u very much.

setup-env.sh build error: bbox

Hello, I am trying to evaluate Depth-VRD, however, when running setup-env.sh I get the following error at step (5/6) -- Compiling libraries...:

building 'bbox' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/lib/python3.7/site-packages/numpy/core/include -I/opt/conda/include/python3.7m -c bbox.c -o build/temp.linux-x86_64-3.7/bbox.o
In file included from /opt/conda/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1832:0,
                 from /opt/conda/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /opt/conda/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from bbox.c:444:
/opt/conda/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^~~~~~~
bbox.c: In function ‘__Pyx__ExceptionSave’:
bbox.c:6996:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
     *type = tstate->exc_type;
                   ^~
bbox.c:6997:20: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
     *value = tstate->exc_value;
                    ^~
bbox.c:6998:17: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
     *tb = tstate->exc_traceback;
                 ^~

A fix to this error would be much appreciated:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.