Code of the Recurrent-VLN-BERT paper:
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
Install the Matterport3D Simulator. Please find the versions of packages in our environment here.
Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.
Please follow the instructions below to prepare the data in directories:
- MP3D navigability graphs:
connectivity
- Download the connectivity maps [23.8MB].
- R2R data:
data
- Download the R2R data [5.8MB].
- Augmented data:
data/prevalent
- Download the collected triplets in PREVALENT [1.5GB] (pre-processed for easy use).
- MP3D image features:
img_features
- Download the Scene features [4.2GB] (ResNet-152-Places365).
Please refer to vlnbert_init.py to set up the directories.
- Pre-trained OSCAR weights
- Download the
base-no-labels
following this guide.
- Download the
- Pre-trained PREVALENT weights
- Download the
pytorch_model.bin
from here.
- Download the
- Recurrent-VLN-BERT:
snap
- Download the trained network weights [2.5GB] for our OSCAR-based and PREVALENT-based models.
Please read Peter Anderson's VLN paper for the R2R Navigation task.
To replicate the performance reported in our paper, load the trained network weights and run validation:
bash run/test_agent.bash
You can simply switch between the OSCAR-based and the PREVALENT-based VLN models by changing the arguments vlnbert
(oscar or prevalent) and load
(trained model paths).
To train the network from scratch, simply run:
bash run/train_agent.bash
The trained Navigator will be saved under snap/
.
If you use or discuss our Recurrent VLN-BERT, please cite our paper:
@article{hong2020recurrent,
title={A Recurrent Vision-and-Language BERT for Navigation},
author={Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen},
journal={arXiv preprint arXiv:2011.13922},
year={2020}
}