The cm2's intro from cogcomp

Cross-modal Map Learning for Vision and Language Navigation

G.Georgakis, K.Schmeckpeper, K.Wanchoo, S.Dan, E.Miltsakaki, D.Roth, K.Daniilidis

IEEE International Conference on Computer Vision and Pattern Recognition 2022

Dependencies

pip install -r requirements.txt

Habitat-lab and habitat-sim need to be installed before using our code. We build our method on the latest stable versions for both, so use git checkout tags/v0.1.7 before installation. Follow the instructions in their corresponding repositories to install them on your system. Note that our code expects that habitat-sim is installed with the flag --with-cuda.

Trained Models

We provide our trained models for reproducing the navigation results shown in the paper here. In addition we provide the semantic segmentation model here. The DD-PPO model (gibson-4plus-mp3d-train-val-test-resnet50.pth) we used for the controller can be found here.

Data

We use the Vision and Language Navigation in Continuous Environments (VLN-CE) dataset. Episodes can be found here. VLN-CE is based on the Matterport3D (MP3D) dataset (the habitat subset and not the entire Matterport3D). Follow the instructions in the habitat-lab repository regarding downloading the data and the dataset folder structure. In addition we provide the following:

MP3D Scene Pclouds: An .npz file for each scene that we generated and that contains the 3D point cloud with semantic category labels (40 MP3D categories). This was done for our convenience because the semantic.ply files for each scene provided with the dataset contain instance labels. The folder containing the .npz files should be under /data/scene_datasets/mp3d.

Instructions

Here we provide instructions on how to use our code. All options can be found in train_options.py. The episodes from VLN-CE should be under the --root_path. The DD-PPO model should be placed under root_path/local_policy_models

Testing on VLN-CE

To run an evaluation of CM2-GT on a single scene from val-seen:

python main.py --name test_cm2-gt_val-seen --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --model_exp_dir /path/to/cm2-gt/model/folder/ --log_dir logs/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1 --split val_seen --use_first_waypoint --vln

To run an evaluation of CM2 on a single scene from val-seen:

python main.py --name test_cm2_val-seen --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --model_exp_dir /path/to/cm2/model/folder/ --log_dir logs/ --img_segm_model_dir /path/to/img/segm/model/folder/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1 --split val_seen --use_first_waypoint --goal_conf_thresh 0.2 --vln_no_map

To enable visualizations during testing use --save_nav_images.

Generating training data

To generate the data for a single scene from train split to train the CM2-GT model:

python store_episodes_vln.py --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --episodes_save_dir /path/to/cm2-gt/episodes/save/dir/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1

To generate the data for a single scene from train split to train the CM2 model:

python store_episodes_vln_no_map.py --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --episodes_save_dir /path/to/cm2/episodes/save/dir/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1 --img_segm_model_dir /path/to/img/segm/model/folder/

Training

To train a new CM2-GT model:

python main.py --name train_cm2-gt --stored_episodes_dir /path/to/cm2-gt/episodes/save/dir/ --log_dir logs/ --is_train --summary_steps 500 --image_summary_steps 1000 --test_steps 20000 --checkpoint_steps 50000 --pad_text_feat --batch_size 40 --vln --finetune_bert_last_layer --use_first_waypoint --sample_1

To train a new CM2 model:

python main.py --name train_cm2 --stored_episodes_dir /path/to/cm2/episodes/save/dir/ --log_dir logs/ --is_train --summary_steps 500 --image_summary_steps 1000 --test_steps 20000 --checkpoint_steps 50000 --pad_text_feat --batch_size 10 --finetune_bert_last_layer --use_first_waypoint --vln_no_map --sample_1

cogcomp / cm2 Goto Github PK

cm2's Introduction

Cross-modal Map Learning for Vision and Language Navigation

Dependencies

Trained Models

Data

Instructions

Testing on VLN-CE

Generating training data

Training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent