Code Monkey home page Code Monkey logo

cleanrl's Introduction

CleanRL (Clean Implementation of RL Algorithms)

Mailing List : cleanrl Meeting Recordings : cleanrl

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments at scale using AWS Batch. The highlight features of CleanRL are:

  • 📜 Single-file implementation
    • Every detail about an algorithm is put into the algorithm's own file. It is therefore easier to fully understand an algortihm and do research with.
  • 📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
  • 📈 Tensorboard Logging
  • 🪛 Local Reproducibility via Seeding
  • 🎮 Videos of Gameplay Capturing
  • 🧫 Experiment Management with Weights and Biases
  • 💸 Cloud Integration with docker and AWS

Good luck have fun 🚀

Algorithms Implemented

Open RL Benchmark

Open RL Benchmark (https://benchmark.cleanrl.dev) is our project to create a comprehensive benchmark of popular DRL algorithms in a variety of games, where everything about the benchmark is open. That is, you can check the following information for each experiment:

  • hyper-parameters (check it at the Overview tab of a run)
  • training metrics (e.g. episode reward, training losses. Check it at the Charts tab of a run)
  • videos of the agents playing the game (check it at the Charts tab of a run)
  • system metrics (e.g. CPU utilization, memory utilization. Check it at the Systems tab of a run)
  • stdout, stderr of the script (check it at the Logs tab of a run)
  • all dependencies (check requirements.txt at the Files tab of a run))
  • source code (this is especially helpful since we have single file implementation, so we know exactly all of the code that is responsible for the run. Check it at the Code tab of a run))
  • (Currently not working. Public access is blocked by wandb/wandb#1177) the exact commands to reproduce it (check it at the Overview tab of a run.

We hope it could bring a new level of transparency, openness, and reproducibility. Our plan is to benchmark as many algorithms and games as possible. If you are interested, please join us and contribute more algorithms and games. To get started, check out our contribution guide and our roadmap for the Open RL Benchmark

We currently support 34+ games and our implmentation performs competitively against published results. See the table below for selected examples

c51_atari_visual.py dqn_atari_visual.py ppo_atari_visual.py
BeamRiderNoFrameskip-v4 9128.00 ± 0.00 6156.13 ± 461.47 1881.11 ± 166.89
QbertNoFrameskip-v4 13814.24 ± 3357.99 15241.67 ± 0.00 18755.36 ± 205.36
SpaceInvadersNoFrameskip-v4 2140.00 ± 0.00 1616.11 ± 226.67 871.56 ± 133.44
PongNoFrameskip-v4 16.33 ± 0.00 19.33 ± 0.33 20.89 ± 0.00
BreakoutNoFrameskip-v4 404.11 ± 0.00 354.78 ± 9.22 413.73 ± 15.39
ddpg_continuous_action.py td3_continuous_action.py ppo_continuous_action.py
Ant-v2 503.32 ± 18.70 5368.18 ± 771.11 3368.17 ± 759.13
Humanoid-v2 942.16 ± 436.22 6334.40 ± 140.05 918.19 ± 102.71
Walker2DBulletEnv-v0 708.51 ± 240.64 2168.87 ± 65.78 906.10 ± 51.96
HalfCheetahBulletEnv-v0 2821.87 ± 266.03 2542.99 ± 318.23 2189.66 ± 141.61
HopperBulletEnv-v0 1540.77 ± 821.54 2302.09 ± 24.46 2300.96 ± 47.46
BipedalWalker-v3 140.20 ± 52.05 164.06 ± 147.22 219.96 ± 47.49
LunarLanderContinuous-v2 210.01 ± 0.00 290.73 ± 4.44 161.28 ± 37.48
Pendulum-v0 -186.83 ± 12.35 -246.53 ± 6.73 -1280.11 ± 39.22
MountainCarContinuous-v0 -0.98 ± 0.02 -1.11 ± 0.10 93.84 ± 0.00

Get started

To run experiments locally, give the following a try:

$ git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
$ pip install -e .
$ cd cleanrl
$ python ppo.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
# open another temrminal and enter `cd cleanrl/cleanrl`
$ tensorboard --logdir runs

demo.gif

To use wandb integration, sign up an account at https://wandb.com and copy the API key. Then run

$ cd cleanrl
$ pip install wandb
$ wandb login ${WANBD_API_KEY}
$ python ppo.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
    --prod-mode \
    --wandb-project-name cleanrltest 
# Then go to https://app.wandb.ai/${WANDB_USERNAME}/cleanrltest/

Checkout the demo sites at https://app.wandb.ai/costa-huang/cleanrltest

demo2.gif

Support and get involved

We have a Slack Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome.

In addition, we also have a monthly development cycle to implement new RL algorithms. Feel free to participate or ask questions there, too. You can sign up for our mailing list at our Google Groups to receive event RVSP which contains the Hangout video call address every week. Our past video recordings are available at YouTube

Contribution

We have a short contribution guide here https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md. Consider adding new algorithms or test new games on the Open RL Benchmark (https://benchmark.cleanrl.dev)

Big thanks to all the contributors of CleanRL!

Citing our project

Please consider using the following Bibtex entry:

@misc{cleanrl,
  author = {Shengyi Huang, Rousslan Dossa, and Chang Ye},
  title = {CleanRL: High-quality Single-file Implementation of Deep Reinforcement Learning algorithms},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vwxyzjn/cleanrl/}},
}

References

I have been heavily inspired by the many repos and blog posts. Below contains a incomplete list of them.

The following ones helped me a lot with the continuous action space handling:

cleanrl's People

Contributors

adamcakg avatar bentrevett avatar chutaklee avatar dosssman avatar vwxyzjn avatar yooceii avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.