CEIP

~~TODO: parallelized version of flow training.~~

Update: The parallelized version is now implemented on the parallel-training branch. (Jun. 21, 2023)

This repository is the code of CEIP and PARROT method for NeurIPS 2022 proceedings "CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations". The project page is https://289371298.github.io/jekyll/update/2022/10/25/CEIP.html.

The dataset can be downloaded at https://www.dropbox.com/s/7wg61g7qjmeg0bv/data.zip?dl=0 . The "data" folder is already structured and should be placed under the CEIP directory.

File Structure

CEIP/data: the data for the experiments, which are downloaded through the link above.

CEIP/dataset: the dataloaders for each experiment.

CEIP/demo: the entry of flow training. It contains two subfolders, CEIP/ours and CEIP/PARROT; each of the two subfolders contains four sub-subfolders, one for each experiment (fetchreach, kitchen_FIST, kitchen_SKiLD, office).

CEIP/hyperparams: the hyperparameters for each experiment.

CEIP/log: the logged results for RL runs.

CEIP/RL: the code for RL runs.

CEIP/robotics: substituting part of the code in gym for fetchreach; see dependency section below for details.

Dependency

As multiple sets of environment is needed, virtual environment like anaconda are strongly recommended.

Fetchreach

For fetchreach experiment, we use a special set of environment with python 3.8.5. The requirements are:

torch==1.10.0
gym==0.21.0
numpy==1.21.4
h5py==2.10.0
stable-baselines3==1.3.0
imageio
wandb
tqdm
matplotlib

We re-write the code of gym robotics environment, which is uploaded in CEIP/robotics. Before running RL for fetchreach, you need to substitute the gym/envs/robotics folder of gym package with our CEIP/robotics folder.

Others

For kitchen-SKiLD, kitchen-FIST and office, we use another set of environment with python 3.8.5. The requirements are:

mujoco-py==2.0.2.13
stable-baselines3==1.3.0
torch==1.10.0
gym>=0.15.4
h5py==2.10.0
imageio
wandb
tqdm
matplotlib

For kitchen environment, a fork of d4rl by the author of SKiLD[1] is required. The URL for the repo is https://github.com/kpertsch/d4rl .

For office environment, a fork of roboverse by the author of SKiLD is required The URL for the repo is https://github.com/VentusYue/roboverse .

Note that mujoco200 is required for these environments. Also, do not install the original version of d4rl and roboverse for substitution.

Running Code

Clone the repository.
Install the dependencies as stated in the dependency section, and download the dataset.
In the CEIP folder, type the following command:

export PYTHONPATH='./':$PYTHONPATH 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to mujoco)/.mujoco/mujoco200/bin

go into the sub-subfolder for demos (for example: demo/ours/fetchreach), type

mkdir code_backup
mkdir models

open the demo file to run; in the line

wandb.login(key=XXXXXXX)
wandb.init(entity=XXXXXXX, project= ...

change XXXXXXX to your key and username for wandb. See wandb official website https://docs.wandb.ai/ for this. We use XXXXXXX for anonimity.

run the code for training flow (see the next section for detailed commands).
get back to the main CEIP directory; run corresponding RL code (for example: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --method ours)
The results of RL are stored in CEIP/log directory; log into your wandb account to check flow training results.

Detailed Steps for Reproducing Results

Fetchreach

Ours

Training flows (in demo/ours/fetchreach):

ours_noTS_noEX: set line 50 of demo_ours.py to "setenv(args.env_name, 10, 4, 8)", then run python demo_ours.py --seed 1000009.

ours_TS_noEX: set line 50 of demo_ours.py to "setenv(args.env_name, 10, 4, 9)", then run python demo_ours.py --seed 1000009.

ours_noTS_EX: set line 50 of demo_ours_withdatabase.py to "setenv(args.env_name, 10, 4, 8)", then run python demo_ours_withdatabase.py --seed 1000009.

ours_TS_EX: set line 50 of demo_ours_withdatabase.py to "setenv(args.env_name, 10, 4, 9)", then run python demo_ours_withdatabase.py --seed 1000009.

ours_4way (below are part of our ablation test; see paper for their meaning): python demo_ours_fourwayrelated.py --seed 1000009

ours_2way: python demo_ours_related.py --seed 1000009

run RL (in main directory):

ours_noTS_noEX: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method ours (change 4.5 to 5.5, 6.5, 7.5 for other directions)

ours_TS_noEX: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method ours_withTS

ours_noTS_EX: python RL/RL_fetch_withdatabase.py --seed 19 --modelseed 1000009 --direction 4.5 --transize 1600 --transfersize 160 --method ours_withdatabase --pushforward no

ours_TS_EX: python RL/RL_fetch_withdatabase.py --seed 19 --modelseed 1000009 --direction 4.5 --transize 1600 --transfersize 160 --method ours_withTS_withdatabase --pushforward no

ours_noTS_EX_forward: python RL/RL_fetch_withdatabase.py --seed 19 --modelseed 1000009 --direction 4.5 --transize 1600 --transfersize 160 --method ours_withdatabase --pushforward yes

ours_TS_EX_forward: python RL/RL_fetch_withdatabase.py --seed 19 --modelseed 1000009 --direction 4.5 --transize 1600 --transfersize 160 --method ours_withTS_withdatabase --pushforward yes

ours_4way: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method ours_fourwayrelated

ours_2way: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method ours_related

PARROT

Training flows (in demo/PARROT/fetchreach):

PARROT-TATS: python demo_BCFlow_all.py --seed 1000009

PARROT-TA: python demo_BCFLOW_allwithoutDdemo.py --seed 1000009

PARROT_4way+TS: python demo_BCFLOW_fourwayrelatedwithDdemo.py --seed 1000009

PARROT_2way: python demo_BCFLOW_relatedwithoutDdemo.py --seed 1000009

PARROT_2way+TS: python demo_BCFLOW_relatedwithDdemo.py --seed 1000009

PARROT_TS_noEX: python demo_Ddemoonly.py --seed 1000009

PARROT_TS_EX: python demo_Ddemoonly_withdatabase.py --seed 1000009

run RL (in the main directory):

PARROT-TATS: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_all

PARROT-TA: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_allwithoutDdemo

PARROT_4way+TS: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_fourwayrelatedwithDdemo

PARROT_2way: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_relatedwithoutDdemo

PARROT_2way+TS: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_relatedwithDdemo

PARROT_TS_noEX: python RL/RL_fetch.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_Ddemoonly

PARROT_TS_EX: python RL/RL_fetch_withdatabase.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_withdatabase --pushforward no

PARROT_TS_EX_forward: python RL/RL_fetch_withdatabase.py --seed 19 --modelseed 1000009 --direction 4.5 --trainsize 1600 --transfersize 160 --method alone_withdatabase --pushforward yes

kitchen-SKiLD

ours

Training flows (in demo/ours/kitchen_SKiLD):

ours-noTS-noEX: set line 70 of demo_ours_forall.py to "setenv(args.env_name, 60, 9, 24)"; set line 60 of demo_ours_forall.py to "hps_model_setter("type", "1layer_single")"; run python demo_ours_forall.py --seed 1000009

ours-TS-noEX: set line 70 of demo_ours_forall.py to "setenv(args.env_name, 60, 9, 25)"; set line 60 of demo_ours_forall.py to "hps_model_setter("type", "1layer_debug")"; run python demo_ours_forall.py --seed 1000009

ours-noTS-EX: set line 70 of demo_ours_forall.py to "setenv(args.env_name, 60, 9, 24)"; set line 60 of demo_ours_forall.py to "hps_model_setter("type", "1layer_single")"; run python demo_ours_forall.py --seed 1000009

ours-TS-EX: set line 70 of demo_ours_forall.py to "setenv(args.env_name, 60, 9, 25)"; set line 60 of demo_ours_forall.py to "hps_model_setter("type", "1layer_debug")"; run python demo_ours_forall.py --seed 1000009

Run RL (in main directory):

ours-noTS-noEX: python RL/RL_PPO_formal_nodatabase_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 300 --pushforward no (300 for SKiLD-A, 301 for SKiLD-B)

ours-TS-noEX: python RL/RL_PPO_formal_nodatabase_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 400 --pushforward no (400 for SKiLD-A, 401 for SKiLD-B)

ours-noTS-EX: python RL/RL_PPO_formal_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 100 -- pushforward no (100 for SKiLD-A, 101 for SKiLD-B)

ours-TS-EX: python RL/RL_PPO_formal_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 200 -- pushforward no (200 for SKiLD-A, 201 for SKiLD-B)

ours-noTS-EX-forward: python RL/RL_PPO_formal_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 100 --pushforward yes

ours-TS-EX-forward: python RL/RL_PPO_formal_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 200 -- pushforward yes

PARROT

Training flows (in demo/PARROT/kitchen_SKiLD):

PARROT-TA, PARROT-TS-noEX: python demo_ours_forall_nodatabase.py --seed 1000009

PARROT-TS-EX: python demo_ours_forall_withdatabase.py --seed 1000009

Run RL (in main directory):

PARROT-TA: python RL/RL_PPO_formal_nodatabase_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 600 (600 for SKiLD-A, 601 for SKiLD-B)

PARROT-TS-noEX: python RL/RL_PPO_formal_nodatabase_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 500 (500 for SKiLD-A, 501 for SKiLD-B)

PARROT-TS-EX: python RL/RL_PPO_formal_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 230 --pushforward no (230 for SKiLD-A, 231 for SKiLD-B)

PARROT-TS-EX-forward: python RL/RL_PPO_formal_kitchen_SKiLD.py --seed 19 --modelseed 1000009 --TL 500 --pushforward yes

Others

Kitchen-FIST and office can be run in a similar way to kitchen-SKiLD; see the code for detailed command.

Reference

[1] K. Pertsch, Y. Lee, Y. Wu, and J. J. Lim. Demonstration-guided reinforcement learning with learned skills. In CoRL, 2021.

kaiyan289 / ceip Goto Github PK

ceip's Introduction

CEIP

File Structure

Dependency

Fetchreach

Others

Running Code

Detailed Steps for Reproducing Results

Fetchreach

Ours

PARROT

kitchen-SKiLD

ours

PARROT

Others

Reference

ceip's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org