Code Monkey home page Code Monkey logo

osrl's Introduction


Python 3.8+ License PyPI GitHub Repo Stars Downloads


OSRL (Offline Safe Reinforcement Learning) offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions. This repository is heavily inspired by the CORL library for offline RL, check them out too!

The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes DSRL and FSRL, and is built to facilitate the development of robust and reliable offline safe RL solutions.

To learn more, please visit our project website. If you find this code useful, please cite:

@article{liu2023datasets,
  title={Datasets and Benchmarks for Offline Safe Reinforcement Learning},
  author={Liu, Zuxin and Guo, Zijian and Lin, Haohong and Yao, Yihang and Zhu, Jiacheng and Cen, Zhepeng and Hu, Hanjiang and Yu, Wenhao and Zhang, Tingnan and Tan, Jie and others},
  journal={arXiv preprint arXiv:2306.09303},
  year={2023}
}

Structure

The structure of this repo is as follows:

โ”œโ”€โ”€ examples
โ”‚   โ”œโ”€โ”€ configs  # the training configs of each algorithm
โ”‚   โ”œโ”€โ”€ eval     # the evaluation escipts
โ”‚   โ”œโ”€โ”€ train    # the training scipts
โ”œโ”€โ”€ osrl
โ”‚   โ”œโ”€โ”€ algorithms  # offline safe RL algorithms
โ”‚   โ”œโ”€โ”€ common      # base networks and utils

The implemented offline safe RL and imitation learning algorithms include:

Algorithm Type Description
BCQ-Lag Q-learning BCQ with PID Lagrangian
BEAR-Lag Q-learning BEARL with PID Lagrangian
CPQ Q-learning Constraints Penalized Q-learning (CPQ))
COptiDICE Distribution Correction Estimation Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation
CDT Sequential Modeling Constrained Decision Transformer
BC-All Imitation Learning Behavior Cloning with all datasets
BC-Safe Imitation Learning Behavior Cloning with safe trajectories
BC-Frontier Imitation Learning Behavior Cloning with high-reward trajectories

Installation

OSRL is currently hosted on PyPI, you can simply install it by:

pip install osrl-lib

You can also pull the repo and install:

git clone https://github.com/liuzuxin/OSRL.git
cd osrl
pip install -e .

If you want to use the CDT algorithm, please also manually install the OApackage:

pip install OApackage==2.7.6

How to use OSRL

The example usage are in the examples folder, where you can find the training and evaluation scripts for all the algorithms. All the parameters and their default configs for each algorithm are available in the examples/configs folder. OSRL uses the WandbLogger in FSRL and Pyrallis configuration system. The offline dataset and offline environments are provided in DSRL, so make sure you install both of them first.

Training

For example, to train the bcql method, simply run by overriding the default parameters:

python examples/train/train_bcql.py --task OfflineCarCircle-v0 --param1 args1 ...

By default, the config file and the logs during training will be written to logs\ folder and the training plots can be viewed online using Wandb.

You can also launch a sequence of experiments or in parallel via the EasyRunner package, see examples/train_all_tasks.py for details.

Evaluation

To evaluate a trained agent, for example, a BCQ agent, simply run

python examples/eval/eval_bcql.py --path path_to_model --eval_episodes 20

It will load config file from path_to_model/config.yaml and model file from path_to_model/checkpoints/model.pt, run 20 episodes, and print the average normalized reward and cost. The pretrained checkpoints for all datasets are available here for reference.

Acknowledgement

The framework design and most baseline implementations of OSRL are heavily inspired by the CORL project, which is a great library for offline RL, and the cleanrl project, which targets online RL. So do check them out if you are interested!

Contributing

If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community!

osrl's People

Contributors

ja4822 avatar liuzuxin avatar howuhh avatar bobhuangc avatar henrylhh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.