Code Monkey home page Code Monkey logo

foundation_actor_critic's Introduction

Large Language Models as Generalizable Policies for Embodied Tasks

This software project accompanies the research paper, Large Language Models as Generalizable Policies for Embodied Tasks. See llm-rl.github.io for more information

Abstract: We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs. Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement.

Getting Started

Installation

  • Setup Python environment:
    • conda create -n llarp -y python=3.9
    • conda activate llarp
  • Install Habitat-Sim conda install -y habitat-sim==0.3.0 withbullet headless -c conda-forge -c aihabitat
  • Install Habitat-Lab:
    • git clone -b 'v0.3.0' --depth 1 https://github.com/facebookresearch/habitat-lab.git ~/habitat-lab
    • cd ~/habitat-lab
    • pip install -e habitat-lab
    • pip install -e habitat-baselines
  • Install VC-1:
    • git clone -b 76fe35e87b1937168f1ec4b236e863451883eaf3 https://github.com/facebookresearch/eai-vc.git ~/eai-vc
    • git submodule update --init --recursive
    • pip install -e ./vc_models
  • Install this repository, first clone this repo and then run pip install -e . in the cloned directory.
  • Download YCB and ReplicaCAD dataset for the Language Rearrangement task. Run the following in this code base's directory.
    • conda install -y -c conda-forge git-lfs
    • python -m habitat_sim.utils.datasets_download --uids rearrange_task_assets
  • Download LLaMA-1 weights. Instructions on how to do this are here. Place the model weights in data/hf_llama_7B/ in the llarp directory. To verify the download is correct, ls data/hf_llama_7B should return pytorch_model-00001-of-00001.bin, config.json, etc.

Commands

Training Commands:

  • Train LLaRP on Language Rearrangement task: python llarp/run.py --config-name=baseline/llarp.yaml
  • The trainer is built on Habitat-Baselines v0.3.0. Most Habitat-Baselines options also apply to LLaRP training. See this README for more information about how to use Habitat Baselines.
  • To run training on multiple GPUs, use torchrun.
  • More configurable options are under llarp/config/baseline/policy/llarp_policy.yaml.

Evaluation Commands: First, get the checkpoint generated by the training command. Then run: python llarp/run.py --config-name=baseline/llarp.yaml habitat_baselines.evaluate=True habitat_baselines.rl.policy.main_agent.hierarchical_policy.high_level_policy.is_eval_mode=True habitat_baselines.eval_ckpt_path_dir=path/to/checkpoint.pth habitat.dataset.data_path=datasets/DATASET.pickle where DATASET refers to one of the Language Rearrangement evaluation splits under data/datasets: train, new_scenes, rephrasing, referring_expressions, spatial_relationships, context, irrelevant_text, multiple_rearrangements, novel_objects, multiple_objects, or conditional_instructions.

Dataset Generation Commands: Since the episodes are procedurally generated, it is possible to generate more episodes beyond the 150,000 in the training dataset.

  • Generate dataset: python llarp/dataset/create_episodes.py --run --config rl_llm/dataset/configs/dataset.yaml --num-episodes 100 --out data/datasets/dataset_path.pickle --limit-scene-set scene_train_split --instruct-path llarp/dataset/configs/instructions.yaml --tag v2_train --verbose --n-procs 112 --seed 7 --procs-per-gpu 14
  • Validate generated episodes are solvable and remove unsolvable episodes: python rl_llm/dataset/dataset_validator.py --cfg rl_llm/config/task/lang_cond.yaml --n-procs 56 habitat.dataset.data_path=/mnt/task_runtime/projects/rl_llm/data/datasets/dataset_path_validated.pickle task_obs=all

Other:

  • Run tests: pytest test. This checks that LLaRP weights are updated exactly as expected after several training iterations under a variety of training settings

Documentation

Code Structure (under llarp/ directory):

  • config/: Hydra configuration YAML files.
    • baseline/: Config files for policies and trainers.
    • task/: Config files for the Language Rearrangement task.
  • dataset/: Utilities for generating the Language Rearrangement dataset files.
    • configs/: Config files for Language Rearrangement dataset generation. In this directory, instructions.yaml contains the language instruction templates and dataset.yaml defines the possible objects and receptacles. Refer to Appendix Section B of the paper for details on the instruction templates.
    • create_episodes.py is the entry point for the episode generation.
    • dataset_validator.py processes an already created dataset and ensures all the episodes are solvable. It deletes unsolvable episodes.
  • policies/: Defines the LLaRP policy module.
    • cores/decoder.py: This contains the bulk of the policy code for sampling from and updating the policy.
    • action_decoders.py: Contains the action decoder head.
    • llm_policy.py: Entry point for the LLaRP policy. This integrates the LLaRP policy with the Habitat Baselines trainer.
    • transformer_storage.py: Modified rollout buffer for transformers in PPO.
    • vis_bridge.py: The observation encoder module.
    • visual_encoders.py: The visual encoder (VC-1).
  • task/: Code to setup the Language Rearrangement task in Habitat-Lab.
  • trainer/: Contains the core RL loop, PPO loss calculation, evaluation, environment creation, and distributed training utilities. Language Rearrangement episode datasets are included in this repository under the datasets/ directory.

License

The code is provided under Apple Sample Code license.

Citation

@article{szot2023large,
  title={Large Language Models as Generalizable Policies for Embodied Tasks},
  author={Szot, Andrew and Schwarzer, Max and Agrawal, Harsh and Mazoure, Bogdan and Talbott, Walter and Metcalf, Katherine and Mackraz, Natalie and Hjelm, Devon and Toshev, Alexander},
  journal={arXiv preprint arXiv:2310.17722},
  year={2023}
}

foundation_actor_critic's People

Contributors

aszot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.