Code Monkey home page Code Monkey logo

policydissect's Introduction

Policy Dissection

[NeurIPS 2022] Official implementation of the paper: Human-AI Shared Control via Policy Dissection

Webpage | Code | Video | Paper |

In this repo, we provide the implementation of Policy Dissection and some interactive neural controllers enabled by this method.

Supported Environments:

Installation

Basic Installation

# Clone the code to local
git clone https://github.com/metadriverse/policydissect.git
cd policydissect

# Create virtual environment
conda create -n policydissect python=3.7
conda activate policydissect

# install torch
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# Install basic dependency
pip install -e .

IsaacGym Installation (Optional)

For playing with agents trained in IsaacGym, follow the instructions below to install IsaacGym

Please review the file isaacgym/docs/install.html for more information on installation. See the Troubleshooting section for debugging.

Mujoco Installation (Optional)

For playing with the Mujoco-Ant and Mujoco-Walker, please

Play with AI

MetaDrive

To collaborate with the AI driver in MetaDrive environment, run:

# MetaDrive
# Keymap:
# - KEY_W: lane following
# - KEY_A: left lane changing
# - KEY_S: braking
# - KEY_D: right lane changing
# - KEY_R:Reset
python play/play_metadrive.py

Pybullet Quadrupedal Robot

The quadrupedal robot is trained with the code provided by https://github.com/Mehooz/vision4leg.git. For playing with legged robot, run:

# Pybullet Quadrupedal Robot
# Keymap:
# - KEY_W: forward
# - KEY_A: moving left
# - KEY_S: stop
# - KEY_D: moving right
# - KEY_R: reset
python play/play_pybullet_a1.py
python play/play_pybullet_a1.py --hard
python play/play_pybullet_a1.py --hard --seed 1001

Also, you can collaborate with AI and challenge the hard environment consisting of obstacles and challenging terrains by adding --hard flag. You can change to a different environment by adding --seed your_seed_int_type.

tips: Avoid running fast!

IsaacGym Cassie

The Cassie robot is trained with the code provided by https://github.com/leggedrobotics/legged_gym with a fixed forward command [1, 0, 0], and thus can only move forward. By applying Policy Dissection, primitives related to yaw rate, forward speed, height control and torque force can be identified. Activating these primitives enable various skills like crouching, forward jumping, back-flipping and so on. Run the following command to play with the robot. Add flag--parkourto launch a challenging parkour environment.

# Keymap:
# - KEY_W:Forward
# - KEY_A:Left
# - KEY_S:Stop
# - KEY_C:Crouch
# - KEY_X:Tiptoe
# - KEY_Q:Jump
# - KEY_D:Right
# - KEY_SPACE:Back Flip
# - KEY_R:Reset
python play/play_cassie.py
python play/play_cassie.py --parkour

tips: Switch to Tiptoe state before pressing Key_Q to increase the distance of jump.

Note Do not draw the windows or close the pygame window during running.

Gym Environments

We also discover motor primitives in three gym environments: Box2d-BipedalWalker, Mujoco-Ant and Mujoco-Walker. You can try them via:

# BipedalWalker
# Keymap:
# - KEY_W: jump
# - KEY_A: front-flip
# - KEY_S: restore running after jumping
# - KEY_R: reset
python play/play_gym_bipedalwalker.py

# Mujoco-Ant
# Keymap:
# - KEY_W: move up
# - KEY_A: move left
# - KEY_S: move down
# - KEY_D: move right
# - KEY_Q: rotation
# - KEY_R: reset
python play/play_mujoco_ant.py
    
# Mujoco-Walker
# Keymap:
# - KEY_R: reset
# - KEY_A: stop
# - KEY_W: freeze red knee
# - KEY_D: restore running
python play/play_mujoco_walker.py

Comparison with explicit goal-conditioned control

To measure the coarseness of the control approach enabled by Policy Dissection, we train a goal-conditioned quadrupedal ANYmal robot controller with code provided by https://github.com/leggedrobotics/legged_gym. We build primitive-activation conditional control system on this controller with a PID controller determining the unit output according to the tracking error. As a result, it can track the target yaw command and can achieve the similar control precision, compared to explicitly indicating the goal in the network input. Video is available here.

The experiment script can be found at play/run_tracking_experiment.py. The default yaw tracking is achieved by explicit goal-conditioned control, while running python play/run_tracking_experiment.py --primitive_activation will change to primitive-activation conditional control.

Policy Dissection Examples

In example folder, we provide two examples showing how to dissect policy. The results can be read by opening read_result.ipynb with jupyter notebook. Also, the identified units are chosen as motor primitives for evoking behaviors of Anymal and the MetaDrive agents. Check previous section about how to play with them.

Troubleshooting

Installing IsaacGym

If you encounter ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory, run this:

export LD_LIBRARY_PATH=/path/to/libpython/directory
# If you are using Conda, the path should be /path/to/conda/envs/your_env/lib.
# For example:
export LD_LIBRARY_PATH=/home/USERNAME/anaconda3/envs/policydissect/lib

If you encounter CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1., try this:

sudo apt-get install build-essential

If you encounter AttributeError: module 'distutils' has no attribute 'version' from tensorboard, try this:

pip install -U setuptools==50.0.0

Installing Mujoco

If you encounter: fatal error: GL/osmesa.h: No such file or directory:

sudo apt-get install libosmesa6-dev

If you encounter: error: [Errno 2] No such file or directory: 'patchelf': 'patchelf':

sudo apt-get install patchelf

If you encounter: ERROR: GLEW initalization error: Missing GL version:

sudo apt-get install -y libglew-dev

Reference

@inproceedings{
    li2022humanai,
    title={Human-{AI} Shared Control via Policy Dissection},
    author={Quanyi Li and Zhenghao Peng and Haibin Wu and Lan Feng and Bolei Zhou},
    booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=LCOv-GVVDkp}
}

policydissect's People

Contributors

pengzhenghao avatar quanyili avatar zhoubolei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

policydissect's Issues

:display(warning): FrameBufferProperties available less than requested.

After upgrading the system from 16.04 to 20.04, the rendered image is very sluggish and completely unusable. When running the program, the following prompt appears:
:display(warning): FrameBufferProperties available less than requested.
requested: depth_bits=1 color_bits=3 red_bits=1 green_bits=1 blue_bits=1 alpha_bits=1 multisamples=8 back_buffers=1 force_hardware
got: depth_bits=32 color_bits=24 red_bits=8 green_bits=8 blue_bits=8 alpha_bits=8 multisamples=4 back_buffers=1 force_hardware force_software

How can I solve this problem, thx

setup.py tiny issue

image

I would suggest adding this line:

image

Also, that's not so friendly to specify particular version of metadrive torch tensorflow gym and numpy.

metadata-generation-failed by installation

Hi, I was trying to install the package on my local machine following the basic installation guide in the readme. After executing pip install -e . I encounter the following error. I want to ask if you have any insight on how to fix this?

Obtaining file:///home/hongyi/workspace/play_ground/policydissect
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
error: Multiple top-level packages discovered in a flat-layout: ['play', 'policydissect'].

  To avoid accidental inclusion of unwanted files or directories,
  setuptools will not proceed with this build.
  
  If you are trying to create a single distribution with multiple packages
  on purpose, you should not rely on automatic discovery.
  Instead, consider the following options:
  
  1. set up custom discovery (`find` directive with `include` or `exclude`)
  2. use a `src-layout`
  3. explicitly set `py_modules` or `packages` with a list of names
  
  To find more information, look for "package discovery" on setuptools docs.
  linux-x86_64
  numpy is enabled.
  numpy_include_dirs = /home/hongyi/anaconda3/envs/policydissect/lib/python3.7/site-packages/numpy/core/include
  linux
  ['policydissect', 'policydissect.metadrive', 'policydissect.weights', 'policydissect.quadrupedal', 'policydissect.utils', 'policydissect.gym', 'policydissect.legged_gym', 'policydissect.quadrupedal.torchrl', 'policydissect.quadrupedal.vision4leg', 'policydissect.quadrupedal.starter', 'policydissect.quadrupedal.torchrl.collector', 'policydissect.quadrupedal.torchrl.replay_buffers', 'policydissect.quadrupedal.torchrl.utils', 'policydissect.quadrupedal.torchrl.policies', 'policydissect.quadrupedal.torchrl.algo', 'policydissect.quadrupedal.torchrl.env', 'policydissect.quadrupedal.torchrl.networks', 'policydissect.quadrupedal.torchrl.collector.para', 'policydissect.quadrupedal.torchrl.replay_buffers.shared', 'policydissect.quadrupedal.torchrl.algo.on_policy', 'policydissect.quadrupedal.torchrl.algo.off_policy', 'policydissect.quadrupedal.vision4leg.envs', 'policydissect.quadrupedal.vision4leg.utilities', 'policydissect.quadrupedal.vision4leg.robots', 'policydissect.quadrupedal.vision4leg.assets', 'policydissect.quadrupedal.vision4leg.envs.utilities', 'policydissect.quadrupedal.vision4leg.envs.env_wrappers', 'policydissect.quadrupedal.vision4leg.envs.gym_envs', 'policydissect.quadrupedal.vision4leg.envs.sensors', 'policydissect.quadrupedal.vision4leg.assets.a1', 'policydissect.legged_gym.envs', 'policydissect.legged_gym.utils', 'policydissect.legged_gym.training_script', 'policydissect.legged_gym.rsl_rl', 'policydissect.legged_gym.rsl_rl.algorithms', 'policydissect.legged_gym.rsl_rl.modules', 'policydissect.legged_gym.rsl_rl.utils', 'policydissect.legged_gym.rsl_rl.env', 'policydissect.legged_gym.rsl_rl.storage', 'policydissect.legged_gym.rsl_rl.runners']
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

How can I quickly reference your policy in a new scenario?

How can I quickly implement your policy in a new scenario?
I am looking to apply your policy in a new scenario, but I noticed that your code loads a pre-trained reinforcement learning model. How can I rapidly train a model that matches your policy in a new environment? For instance, how can I train a model that can be directly used with your ppo_inference_tf function?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.