Code Monkey home page Code Monkey logo

deeprl's Introduction

DeepRL

Modularized implementation of popular deep RL algorithms by PyTorch. Easy switch between toy tasks and challenging games.

Implemented algorithms:

  • (Double/Dueling) Deep Q-Learning (DQN)
  • Categorical DQN (C51, Distributional DQN with KL Distance)
  • Quantile Regression DQN
  • (Continuous/Discrete) Synchronous Advantage Actor Critic (A2C)
  • Synchronous N-Step Q-Learning
  • Deep Deterministic Policy Gradient (DDPG, low-dim-state)
  • (Continuous/Discrete) Synchronous Proximal Policy Optimization (PPO, pixel & low-dim-state)
  • The Option-Critic Architecture (OC)

Asynchronous algorithms (e.g., A3C) can be found in v0.1. Action Conditional Video Prediction can be found in v0.4.

Dependency

  • MacOS 10.12 or Ubuntu 16.04
  • PyTorch v0.4.0
  • Python 3.6, 3.5
  • OpenAI Baselines (commit 8e56dd)
  • Core dependencies: pip install -e .

Remarks

  • There is a super fast DQN implementation with an async actor for data generation and an async replay buffer to transfer data to GPU. Enable this implementation by setting config.async_actor = True and using AsyncReplay. However, with atari games this fast implementation may not work in macOS. Use Ubuntu or Docker instead.
  • Python 2 is not officially supported after v0.3. However, I do expect most of the code will still work well in Python 2.
  • Although there is a setup.py, which means you can install the repo as a library, this repo is never designed to be a high-level library like Keras. Use it as your codebase instead.
  • Code for my papers can be found in corresponding branches, which may be good examples for extending this codebase.

Usage

examples.py contains examples for all the implemented algorithms

Dockerfile contains a perfect environment, highly recommended

Please use this bibtex if you want to cite this repo

@misc{deeprl,
  author = {Shangtong, Zhang},
  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/ShangtongZhang/DeepRL}},
}

Curves

BreakoutNoFrameskip-v4

Loading...

  • This is my synchronous option-critic implementation, not the original one.
  • The curves are not directly comparable, as many hyper-parameters are different.

Mujoco

  • DDPG evaluation performance. Loading...

  • PPO online performance. Loading...

References

deeprl's People

Contributors

shangtongzhang avatar wassname avatar nadavbh12 avatar

Watchers

Marek Kolodziej avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.