Code Monkey home page Code Monkey logo

3d_diffuser_actor's Introduction

PWC PWC

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

By Tsung-Wei Ke*, Nikolaos Gkanatsios* and Katerina Fragkiadaki

Official implementation of "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations".

This code base also includes our re-implementation of "Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation". We provide trained model weights for both methods.

teaser

We marry diffusion policies and 3D scene representations for robot manipulation. Diffusion policies learn the action distribution conditioned on the robot and environment state using conditional diffusion models. They have recently shown to outperform both deterministic and alternative state-conditioned action distribution learning methods. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy architecture that, given a language instruction, builds a 3D representation of the visual scene and conditions on it to iteratively denoise 3D rotations and translations for the robot’s end-effector. At each denoising iteration, our model represents end-effector pose estimates as 3D scene tokens and predicts the 3D translation and rotation error for each of them, by featurizing them using 3D relative attention to other 3D visual and language tokens. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 16.3% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it outperforms the current SOTA in the setting of zero-shot unseen scene generalization by being able to successfully run 0.2 more tasks, a 7% relative increase. It also works in the real world from a handful of demonstrations. We ablate our model’s architectural design choices, such as 3D scene featurization and 3D relative attentions, and show they all help generalization. Our results suggest that 3D scene representations and powerful generative modeling are keys to efficient robot learning from demonstrations.

Model overview and stand-alone usage

To facilitate fast development on top of our model, we provide here an overview of our implementation of 3D Diffuser Actor.

The model can be indenpendently installed and used as stand-alone package.

> pip install -e .
# import the model
> from diffuser_actor import DiffuserActor, Act3D
> model = DiffuserActor(...)

Installation

Create a conda environment with the following command:

# initiate conda env
> conda update conda
> conda env create -f environment.yaml
> conda activate 3d_diffuser_actor

# install diffuser
> pip install diffusers["torch"]

# install dgl (https://www.dgl.ai/pages/start.html)
> pip install dgl -f https://data.dgl.ai/wheels/cu116/dgl-1.1.3%2Bcu116-cp38-cp38-manylinux1_x86_64.whl

# install flash attention (https://github.com/Dao-AILab/flash-attention#installation-and-features)
> pip install packaging
> pip install ninja
> pip install flash-attn --no-build-isolation

Install CALVIN locally

Remember to use the latest calvin_env module, which fixes bugs of turn_off_led. See this post for detail.

> git clone --recurse-submodules https://github.com/mees/calvin.git
> export CALVIN_ROOT=$(pwd)/calvin
> cd calvin
> cd calvin_env; git checkout main
> cd ..
> ./install.sh; cd ..

Install RLBench locally

# Install open3D
> pip install open3d

# Install PyRep (https://github.com/stepjam/PyRep?tab=readme-ov-file#install)
> mkdir PyRep; 
> cd PyRep/
> wget https://www.coppeliarobotics.com/files/V4_1_0/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz
> tar -xf CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz;
> echo "export COPPELIASIM_ROOT=$(pwd)/PyRep/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04" >> $HOME/.bashrc; 
> echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$COPPELIASIM_ROOT" >> $HOME/.bashrc;
> echo "export QT_QPA_PLATFORM_PLUGIN_PATH=\$COPPELIASIM_ROOT" >> $HOME/.bashrc;
> source $HOME/.bashrc;
>pip install -r requirements.txt; pip install -e .; cd ..

# Install RLBench (Note: there are different forks of RLBench)
# PerAct setup
> git clone https://github.com/MohitShridhar/RLBench.git
> cd RLBench; git checkout -b peract --track origin/peract; pip install -r requirements.txt; pip install -e .; cd ..;

Remember to modify the success condition of close_jar task in RLBench, as the original condition is incorrect. See this pull request for more detail.

Data Preparation

See Preparing RLBench dataset and Preparing CALVIN dataset.

(Optional) Encode language instructions

We provide our scripts for encoding language instructions with CLIP Text Encoder on CALVIN. Otherwise, you can find the encoded instructions on CALVIN and RLBench (Link).

> python data_preprocessing/preprocess_calvin_instructions.py --output instructions/calvin_task_ABC_D/validation.pkl --model_max_length 16 --annotation_path ./calvin/dataset/task_ABC_D/validation/lang_annotations/auto_lang_ann.npy

> python data_preprocessing/preprocess_calvin_instructions.py --output instructions/calvin_task_ABC_D/training.pkl --model_max_length 16 --annotation_path ./calvin/dataset/task_ABC_D/training/lang_annotations/auto_lang_ann.npy

Model Zoo

We host the model weights on hugging face.

RLBench (PerAct) RLBench (GNFactor) CALVIN
3D Diffuser Actor Weights Weights Weights
Act3D Weights Weights N/A
input image     input image

Evaluate the pre-trained weights

First, donwload the weights and put under train_logs/

Important note: Our released model weights of 3D Diffuser Actor assume input quaternions are in wxyz format. Yet, we didn't notice that CALVIN and RLBench simulation use different quaternion formats (wxyz and xyzw). We have updated our code base with an additional argument quaternion_format to switch between these two formats. We have verified the change by re-training and testing 3D Diffuser Actor on GNFactor with xyzw quaternions. The model achieves similar performance as the released checkpoint. Please see this post for more detail.

For users to train 3D Diffuser Actor from scratch, we update the training scripts with the correct xyzw quaternion format. For users to test our released model, we keep the wxyz quaternion format in the testing scripts (Peract, GNFactor).

Getting started

See Getting started with RLBench and Getting started with CALVIN.

Citation

If you find this code useful for your research, please consider citing our paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations".

@article{3d_diffuser_actor,
  author = {Ke, Tsung-Wei and Gkanatsios, Nikolaos and Fragkiadaki, Katerina},
  title = {3D Diffuser Actor: Policy Diffusion with 3D Scene Representations},
  journal = {Arxiv},
  year = {2024}
}

License

This code base is released under the MIT License (refer to the LICENSE file for details).

Acknowledgement

Parts of this codebase have been adapted from Act3D and CALVIN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.