ddl,cwhao98

ddl's Introduction

Learning Disentanglement with Decoupled Labels for Vision-Language Navigation (ECCV'22)

Introduction

We manually extend the benchmark dataset Room-to-Room with landmark- and action-aware labels in order to provide fine-grained information for each viewpoint. Below figure shows an illustration of decoupled labels providing intermediate supervision during navigation. The superscripts in the instruction denote the landmark and action labels for each viewpoint. The decoupled labels not only contain disentangled information, but help the alignment between vision and language modalities.

LAR2R

The annotations are stored in directory LAR2R/. Note the json files only contain annotated index, without original instruction. Next, we present an example to show the file structure.

{'9f0079fa767e402cb515c7751a13e265': {'0': {'action': [0, 1], 'landmark': [2, 3, 4]},
                                      '1': {'action': [0, 1], 'landmark': [2, 3]},
                                      '2': {'action': [0, 1], 'landmark': [2, 3]}},
 '3abee6c9f9d144cead7d659a476ecb07': {'0': {'action': [6]},
                                      '1': {'action': [5]},
                                      '2': {'action': [16]}},
 '17e450ed7bd2429b81d50ebe770937aa': {'0': {'landmark': [7, 8, 9, 10, 11, 12]},
                                      '1': {'landmark': [6, 7, 8, 9, 10, 11]},
                                      '2': {'landmark': [17, 18, 19, 20, 21, 22, 23, 24, 25, 26]}},
 'id': 9}

Here, the json file is a dict, and the first level key is path_id. The above example illustrates the case of one trajectory. 9f0079fa767e402cb515c7751a13e265 is the viewpoint name. Then 0 1 2 is the instruction_id, which contains the specific index. Note that the index is 0-based. We also provide a simple demo to help better comprehension.

Installation

Please install Matterport3D simulator environment with the old version (v0.1).
Please follow the instructions of HAMT to install requirements, and download data. Please put the data in datasets directory.

Training

To train the decoupled label speaker

cd r2r_src
bash scripts/run_dls.sh

To train the navigator

bash scripts/run_r2r.sh

Acknowledgement

This repository is partly built upon HAMT and Recurrent-VLN-BERT. Thanks them for their great works!!!

ddl's People

Contributors

Stargazers

Watchers

ddl's Issues

Missing labels for some viewpoints

Hi!
Thank you for open-sourcing the code and data, this work is very interesting and significant.

When using the provided annotations, I found that some landmark and action words do not have their corresponding viewpoints. For example, the following picture is the visualization of the case whose instr_id is 2804_2. This picture only shows the information of the annotated action words. If the viewpoint does not contain its corresponding action words, I just let the first token "[CLS]" be the action word of this viewpoint. Thus, in this case, only the first viewpoint contains its action words. The verbs like "turn right" and "walk up" do not have their corresponding viewpoints.

Is this normal? If it is abnormal, could you please check the data?

Thanks again!

Would you please share the checkpoint and its corresponding hyperparameters with me?

Hello, thank you for your excellent work. I am currently attempting to implement your model in my local environment. However, I am facing difficulties in achieving the performance of the R2R test dataset displayed in your paper. While the performance in Val Unseen is similar to what you reported, it is notably lower in the test dataset. I was wondering if it would be possible for you to share the checkpoint of your best model and its corresponding hyperparameters with me. Thank you in advance!

Recommend Projects

cwhao98 / ddl Goto Github PK

ddl's Introduction

Learning Disentanglement with Decoupled Labels for Vision-Language Navigation (ECCV'22)

Introduction

LAR2R

Installation

Training

Acknowledgement

ddl's People

Contributors

Stargazers

Watchers

ddl's Issues

Missing labels for some viewpoints

Would you please share the checkpoint and its corresponding hyperparameters with me?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent