This repository is the official implementation of the State Planning Policy Reinforcement Learning.
Demo video.
Code was run on Ubuntu 18.03 in anaconda environment, in case of another set-up, extra dependencies could be required. To install requirements run:
pip install -r rltoolkit/requirements.txt
Requirements will install mujoco-py which will work only on installed mujoco with licence (see Install MuJoCo section in mujoco-py documentation)
Then install rltoolkit
with:
pip install -e rltoolkit/
To train the models in the paper, you can use scripts from train
folder.
For example, to train SPP-SAC on the hopper, simply run:
python train/spp_sac_hopper.py
After running the script the folder with logs will appear. It will contain tensorboard logs of your runs and basic_logs
folder. In basic_logs
you can find 2 pickle files per experiment one with model and one with pickled returns history.
You can find hyperparameters used in our experiments either in paper appendix or train
folder scripts.
take note of the N_CORES
parameter within the training scripts, which
should be set accordingly to the available CPU unit(s).
Model evaluation code is available in the jupyter notebook: notebooks/load_and_test.ipynb
.
There you can load pre-trained models, evaluate their reward, and render in the environment.
You can find pre-trained models in models
directory and check how to load them in load_and_test.ipynb
notebook.
Our model achieves the following performance on OpenAI gym MuJoCo environments:
HalfCheetah results:
Hopper results:
Walker2d results:
Ant results: