Code Monkey home page Code Monkey logo

rvl-bert's Introduction

RVL-BERT

This repository accompanies our IEEE Access paper "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" and contains validation experiments code and the models on the SpatialSense and the VRD dataset.

Image of RVL-BERT architecture

Installation

This project is constructed with Python 3.6, PyTorch 1.1.0 and CUDA 9.0 and largely based on VL-BERT.

Please follow the original instruction to install an conda environment.

Dataset

SpatialSense

  1. Download the SpatialSense dataset here.
  2. Put the files under $RVL_BERT_ROOT/data/spasen and unzip the images.tar.gz as images/ there. Ensure there're two folders (flickr/ and nyu) below $RVL_BERT_ROOT/data/spasen/images/.

VRD

  1. Download the VRD dataset: images (Backup: download sg_dataset.zip from Baidu) and annotations
  2. Put the sg_train_images/ and sg_test_images/ folders under $RVL_BERT_ROOT/data/vrd/images.
  3. Put all .json files under $RVL_BERT_ROOT/data/vrd/.

Checkpoints & Pretrained Weights

Common

Download the pretrained weights here and put the pretrained_model/ folder under $RVL_BERT_ROOT/model/.

SpatialSense

Download the trained checkpoint here and put the .model file under $RVL_BERT_ROOT/checkpoints/spasen/.

VRD

Download the trained checkpoints and put the .model files under $RVL_BERT_ROOT/checkpoints/vrd/:

Validation

Run the following commands to reproduce experiment results. A single GPU (NVIDIA Quadro RTX 6000, 24G memory) is used by default.

SpatialSense

  • Full model
python spasen/test.py --cfg cfgs/spasen/full-model.yaml --ckpt checkpoints/spasen/full-model-e44.model --bs 8 --gpus 0 --model-dir ./ --result-path results/ --result-name spasen_full_model --split test --log-dir logs

VRD

  • Basic model:
python vrd/test.py --cfg cfgs/vrd/basic.yaml --ckpt checkpoints/vrd/basic-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic --split test --log-dir logs/
  • Basic model + Visual-Linguistic Commonsense Knowledge
python vrd/test.py --cfg cfgs/vrd/basic_vl.yaml --ckpt checkpoints/vrd/basic-vl-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
  • Basic model + Visual-Linguistic Commonsense Knowledge + Spatial Module
python vrd/test.py --cfg cfgs/vrd/basic_vl_s.yaml --ckpt checkpoints/vrd/basic-vl-s-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/
  • Full model
python vrd/test.py --cfg cfgs/vrd/basic_vl_s_m.yaml --ckpt checkpoints/vrd/basic-vl-s-m-e59.model --bs 1 --gpus 0 --model-dir ./ --result-path results/ --result-name vrd_basic_vl --split test --log-dir logs/

Credit

This repository is mainly based on VL-BERT.

Citation

Please cite our paper if you find the paper or our code help your research!

@ARTICLE{9387302,
  author={M. -J. {Chiou} and R. {Zimmermann} and J. {Feng}},
  journal={IEEE Access}, 
  title={Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations}, 
  year={2021},
  volume={9},
  number={},
  pages={50441-50451},
  doi={10.1109/ACCESS.2021.3069041}}

rvl-bert's People

Contributors

coldmanck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rvl-bert's Issues

Link to VRD dataset

Hi! It seems like the link to download VRD images is invalid. Do you have a backup of those images somewhere?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.