Code Monkey home page Code Monkey logo

recurrent-vln-bert's Introduction

Recurrent VLN-BERT

Code of the CVPR 2021 Oral paper:
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

[Paper & Appendices] [GitHub]

"Neo : Are you saying I have to choose whether Trinity lives or dies? The Oracle : No, you've already made the choice. Now you have to understand it." --- The Matrix Reloaded (2003).

Prerequisites

Installation

Install the Matterport3D Simulator. Notice that this code uses the old version (v0.1) of the simulator, but you can easily change to the latest version which supports batches of agents and it is much more efficient.

Please find the versions of packages in our environment here.

Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.

Data Preparation

Please follow the instructions below to prepare the data in directories:

Initial OSCAR and PREVALENT weights

Please refer to vlnbert_init.py to set up the directories.

  • Pre-trained OSCAR weights
    • Download the base-no-labels following this guide.
  • Pre-trained PREVALENT weights
    • Download the pytorch_model.bin from here.

Trained Network Weights

R2R Navigation

Please read Peter Anderson's VLN paper for the R2R Navigation task.

Reproduce Testing Results

To replicate the performance reported in our paper, load the trained network weights and run validation:

bash run/test_agent.bash

You can simply switch between the OSCAR-based and the PREVALENT-based VLN models by changing the arguments vlnbert (oscar or prevalent) and load (trained model paths).

Training

Navigator

To train the network from scratch, simply run:

bash run/train_agent.bash

The trained Navigator will be saved under snap/.

Citation

If you use or discuss our Recurrent VLN-BERT, please cite our paper:

@InProceedings{Hong_2021_CVPR,
    author    = {Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen},
    title     = {A Recurrent Vision-and-Language BERT for Navigation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {1643-1653}
}

recurrent-vln-bert's People

Contributors

yiconghong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

recurrent-vln-bert's Issues

ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

Hello, I was trying to run the model with bash run/test_agent.bash as instructed in your readme but i get the error:
Optimizer: Using AdamW To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html Traceback (most recent call last): File "r2r_src/train.py", line 13, in <module> from agent import Seq2SeqAgent File "/Recurrent-VLN-BERT/r2r_src/agent.py", line 21, in <module> import model_OSCAR, model_PREVALENT File "/Recurrent-VLN-BERT/r2r_src/model_OSCAR.py", line 7, in <module> from vlnbert.vlnbert_init import get_vlnbert_models File "/Recurrent-VLN-BERT/r2r_src/vlnbert/vlnbert_init.py", line 3, in <module> from transformers.pytorch_transformers import (BertConfig, BertTokenizer) ModuleNotFoundError: No module named 'transformers.pytorch_transformers'
I have transformers and pytorch transformers installed also the old version of pytorch-pretrained-bert and am unsure of what is causing this, any help? thanks in advance

Failed to build Matterport3D Simulator

Hi Yicong,

This is not directly related to your code, but I've spent hours trying to follow the Matterport3DSimulator repo to build it, I encountered issues either building with or without Docker.

With docker, MatterSim can be built, but it is only available for system python, since I used anaconda on the lab server, importing matterSim will fail in my anaconda environment.

Without docker, the build failed. It's some line in the code has an error. line 59 of src/lib/NavGraph.cpp. CV_LOAD_IMAGE_ANYDEPTH is not defined in the scope. I only downloaded matterport_skybox_images, and this might be the problem (however, the readme.md in matterport3dsimulator says matterport_skybox_images is what you need to get the simulator to build and work) I wonder what data did you download from matterport 3D dataset?

Best,
Jason

the data file R2R_test.json wasn't used when testing?

Hi,yicong! I have reproduced this codebase. While I tried run/test_agent.bash, I notice the data file R2R_test.json wasn't used by the test. So I set the key parameter 'submit' as 1 and rewirte the file 'id_paths.json' to test without any other change. And I get the following results.

`Optimizer: Using AdamW
Namespace(IMAGENET_FEATURES='img_features/ResNet-152-imagenet.tsv', angle_feat_size=128, aug=None, batchSize=16, description='VLNBERT-test-Prevalent', dropout=0.5, epsilon=0.1, featdropout=0.4, feature_size=2048, features='places365', feedback='sample', gamma=0.9, ignoreid=-100, iters=300000, load='snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen', loadOptim=False, log_dir='snap/VLNBERT-test-Prevalent', lr=1e-05, maxAction=15, maxInput=80, ml_weight=0.2, name='VLNBERT-test-Prevalent', normalize_loss='total', optim='adamW', optimizer=<class 'torch.optim.adamw.AdamW'>, submit=1, teacher='final', teacher_weight=1.0, test_only=0, train='validlistener', vlnbert='prevalent', weight_decay=0.0, zero_init=False)

Start loading the image feature ... (~50 seconds)
Finish Loading the image feature from img_features/ResNet-152-places365.tsv in 54.7334 seconds
The feature size is 2048
Loading navigation graphs for 61 scans
R2RBatch loaded with 14039 instructions, using splits: train
The feature size is 2048
Loading navigation graphs for 59 scans
R2RBatch loaded with 1501 instructions, using splits: val_train_seen
The feature size is 2048
Loading navigation graphs for 56 scans
R2RBatch loaded with 1021 instructions, using splits: val_seen
The feature size is 2048
Loading navigation graphs for 11 scans
R2RBatch loaded with 2349 instructions, using splits: val_unseen
The feature size is 2048
Loading navigation graphs for 18 scans
R2RBatch loaded with 4173 instructions, using splits: test

Initalizing the VLN-BERT model ...
Loaded the listener model at iter 114000 from snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen
result length 1501
Env name: val_train_seen, nav_error: 0.8354, oracle_error: 0.6634, steps: 5.1845, lengths: 10.0276, success_rate: 0.9394, oracle_rate: 0.9520, spl: 0.9124
result length 1021
Env name: val_seen, nav_error: 2.8968, oracle_error: 1.9405, steps: 5.5436, lengths: 11.1379, success_rate: 0.7228, oracle_rate: 0.7826, spl: 0.6775
result length 2349
Env name: val_unseen, nav_error: 3.9255, oracle_error: 2.5431, steps: 6.1243, lengths: 12.0028, success_rate: 0.6279, oracle_rate: 0.7024, spl: 0.5688
result length 4173
Env name: test, nav_error: 9.0420, oracle_error: 0.0000, steps: 6.1107, lengths: 12.3490, success_rate: 0.0357, oracle_rate: 1.0000, spl: 0.0000`

I am really shocked by the results on the test data. Do I make some mistakes? Where is it?

Why don't you use ‘speaker’ during training?

Hi! I don't see any codes about 'speaker', a useful way to make data augmentation for R2R. I am wondering why you delete the speaker part in your codes? Or have you done the experiments to show that using speaker doesn't work well in your method?
Thanks a lot!

is it possible to have a branch for REVERIE

Hello,

Thanks a lot for maintaining your open-source code!

As mentioned in #9, is it possible to have models and code available for REVERIE? I would like to have a fair comparison of your approach.

R2R Test Unseen

Thank you so much for the great work you do. I reproduced Recurrent VLN on the R2R datasets according to the file readme.md.
But I only get the results for the validation set, how can I get the results of the Test Unseen?

Specify license for the code

Hello,

Thanks again for your codebase. It was very useful indeed and congratulations for your accepted paper. I was wondering whether you could please add a license to your codebase so that it's very clear how this code can be used by third parties.

Thanks,
Alessandro

How to download panoramic images?

Thank you very much for your excellent work!

Could you please share how to download panoramic images and obtain visualizations for navigation?

I really appreciate your help!

The vocab size

Hi, yicong,

Thanks for your great work!
I found the vocab size of R2R is 991,but the vocab size of Prevalent aug data is 1101. Additionaly, the Prevalent instructions is generated based on a speaker model trained on R2R dataset. Do you have any idea about this?

Thanks,

Version of the Matterport3D Simulator

Hi, Yicong

When I try to config the environment of Recurrent-VLN-BERT, I find only the old version of Mattorport3D Simulator supports this code. Because the new version has changed its api like 'sim.makeAction' for Parallel navigation. Maybe you can note this matter needing attention in readme.txt.

Mismatch between weights?

Hi there,

Congratulations for your CVPR paper and for releasing your code. I was wondering whether you could clarify the structure of the checkpoints you released. I'm interested in the OSCAR version of your model and I tried to load it. However, it looks like the following parameters cannot be found:

'img_projection.weight', 'img_projection.bias'

I tried to inspect the VLNBert class in the file vlnbert_OSCAR.py and it looks like there is not a module called img_projection. Instead, seems there is one in the vlnbert_PREVALENT.py file. In addition, even in the original OSCAR codebase I cannot find a mention to the img_projection layer (https://github.com/microsoft/Oscar/blob/master/oscar/modeling/modeling_bert.py). Could you please verify that the released model checkpoints are correct and referring to the correct models?

Thanks,
Alessandro

Details about the no init. OSCAR model

Hi Yicong, I wonder how do you initialize the no init. OSCAR model to get the results reported in the paper. Did you initialize all the parameters randomly or use some pretrained weights, e.g., initialize the language part with Bert pretrained weights?

A Request for Code of REVERIE

Hi Yicong, I am interested in citing your work on REVERIE and would appreciate it if you could share the code and features with me. I have already sent an email to you regarding this matter. Thank you very much for your assistance!

How to reduce the time-consuming during training processes?

Hi ,@YicongHong
Thank you so much for the great work you do. I reproduced Recurrent VLN BERT on the R2R and REVERIE datasets according to the file readme.md. During the training process, it takes about 2,500 minutes for a single GPU to run 200,000 iterations. Even if dual GPUs are used, this time still does not drop at all. I'm very confused about this. Do you also need such a long time in the training process? Why does adding GPUs not increase speed? What limits the improvement of speed in training process? Is there any other way to improve the speed besides reducing the number of iterations?

Could you provide the object features for the REVERIE task?

Hi Yicong, I'm very impressed by the recurrent VLN-BERT work and want to re-produce it for the REVERIE task. But I noticed that the current repo seemed to only provide related files for the R2R task, e.g. the view features. So I open this issue to ask if you could release the object features (from Faster R-CNN) used in the REVERIE task. Thanks in advance!

A requst for the REVERIE

Hi yicong. I want to cite your work about REVERIE. Could you please provide the code and feartures? I have sent an e-mail to you. Thanks a lot!

REVERIE

How do you train the REVERIE dataset?

Why split instructions?

Hi Yicong,

Thanks for open source your code!

I wonder why do you split instructions in /r2r_src/env.py, line 129 to 142

# Split multiple instructions into separate entries
for j, instr in enumerate(item['instructions']):
    try:
        new_item = dict(item)
        new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
        new_item['instructions'] = instr

        ''' BERT tokenizer '''
        instr_tokens = tokenizer.tokenize(instr)
        padded_instr_tokens, num_words = pad_instr_tokens(instr_tokens, args.maxInput)
        new_item['instr_encoding'] = tokenizer.convert_tokens_to_ids(padded_instr_tokens)

        if new_item['instr_encoding'] is not None:  # Filter the wrong data
            self.data.append(new_item)
            scans.append(item['scan'])
    except:
        continue

This is done for original path-instruction but not for prevalent_aug.json. I wonder why do you do this. I understand that instructions in the original data is a bit long, but if you split then in to separate VLN jobs, while the desired path is always the complete path, how can an agent (or human) possibly do that?

Best,
Jason

Unable to test code

Hello Yicong,

Can you please add a section in the README about using Matterport3DSimulator docker image with your code? The documentation is missing details on where to put the ResNet zip, prevalent JSON, and the PyTorch model. It is unclear how MatterPort3DSimulator works with your code.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.