Code Monkey home page Code Monkey logo

orist's Introduction

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

This is the repository of ORIST (ICCV 2021).

Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.

Features of the Code

  • Implemented distributed data parallel training (pytorch).
  • Some code optimization for fast training

Requirements

  • Install Docker with GPU support (There are lots of tutorials, just google it.)
  • Pull the docker image:
docker pull qykshr/ubuntu:orist 

Quick Start

  1. Download the processed data and pretrained models:

  2. Build Matterport3D simulator

    Build OSMesa version using CMake:

    mkdir build && cd build
    cmake -DOSMESA_RENDERING=ON ..
    make
    cd ../

    Other versions can refer to here

  3. Run inference:

    sh eval_scripts/xxx.sh

  4. Run training:

    sh run_scripts/xxx.sh

Citation

If this code or data is useful for your research, please consider citing:

@inproceedings{orist,
  author    = {Yuankai Qi and
               Zizheng Pan and
               Yicong Hong and
               Ming{-}Hsuan Yang and
               Anton van den Hengel and
               Qi Wu},
  title     = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
  booktitle   = {ICCV},
  pages     = {1655--1664},
  year      = {2021}
}

@inproceedings{reverie,
  author    = {Yuankai Qi and
               Qi Wu and
               Peter Anderson and
               Xin Wang and
               William Yang Wang and
               Chunhua Shen and
               Anton van den Hengel},
  title     = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
               Environments},
  booktitle = {CVPR},
  pages     = {9979--9988},
  year      = {2020}
}

orist's People

Contributors

yuankaiqi avatar

Stargazers

Wu Chen avatar  avatar  avatar  avatar Shuo Feng avatar Zhaoyi Zhang avatar RunweiSitu avatar  avatar Guanqi Chen avatar Feng Chen avatar Zizheng Pan avatar Keji avatar Jackie Chou avatar lyp avatar patrick_ avatar Dong An avatar

Watchers

 avatar lyp avatar

orist's Issues

Object feature format

Hi,

I am interested in using your pre-computed object features, and would like to ask for an explanation of the format they come in, and how to use it in code, if possible.

Thanks in advance,
Benjamin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.