RVL-BERT

This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.

Installation

This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.

Please follow the original instruction to install an conda environment.

Dataset

SpatialSense

Download the SpatialSense dataset here.
Put the files under $RVL_BERT_ROOT/data/spasen and unzip the images.tar.gz as images/ there. Ensure there're two folders (flickr/ and nyu) below $RVL_BERT_ROOT/data/spasen/images/.

VRD

Download the VRD dataset: images (Backup: download sg_dataset.zip from Baidu) and annotations
Put the sg_train_images/ and sg_test_images/ folders under $RVL_BERT_ROOT/data/vrd/images.
Put all .json files under $RVL_BERT_ROOT/data/vrd/.

Checkpoints & Pretrained Weights

Common

Download the pretrained weights here and put the pretrained_model/ folder under $RVL_BERT_ROOT/model/.

SpatialSense

Download the trained checkpoint here and put the .model file under $RVL_BERT_ROOT/checkpoints/spasen/.

VRD

Download the trained checkpoints and put the .model files under $RVL_BERT_ROOT/checkpoints/vrd/:

Validation

Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.

SpatialSense

Full model

python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs

VRD

Basic model:

python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/

Basic model + Visual-Linguistic Commonsense Knowledge

python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module

python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Full model

python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Credit

This repository is mainly based on VL-BERT.

Citation

Please cite our paper if you find the paper or our code help your research!

@ARTICLE{9387302,
  author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
  journal={IEEE Access}, 
  title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations}, 
  year={2021},
  volume={9},
  number={},
  pages={50441-50451},
  doi={10.1109/ACCESS.2021.3069041}}

coldmanck / rvl-bert Goto Github PK

rvl-bert's Introduction

RVL-BERT

Installation

Dataset

SpatialSense

VRD

Checkpoints & Pretrained Weights

Common

SpatialSense

VRD

Validation

SpatialSense

VRD

Credit

Citation

rvl-bert's People

Contributors

Stargazers

Watchers

Forkers

rvl-bert's Issues

Recommend Projects

Recommend Topics

Recommend Org