Code Monkey home page Code Monkey logo

reinforcementlearningzoo.jl's Introduction


⚠️ This package is moved into ReinforcementLearning.jl (2021-05-06)

This project aims to provide some implementations of the most typical reinforcement learning algorithms.

Algorithms Implemented

  • VPG (Vanilla Policy Gradient, with a baseline)
  • DQN
  • Prioritized DQN
  • Rainbow
  • IQN
  • A2C/A2C with GAE/MAC
  • PPO
  • DDPG
  • TD3
  • SAC
  • CFR/OS-MCCFR/ES-MCCFR/DeepCFR
  • Minimax
  • Behavior Cloning

If you are looking for tabular reinforcement learning algorithms, you may refer ReinforcementLearningAnIntroduction.jl.

Built-in Experiments

Some built-in experiments are exported to help new users to easily run benchmarks with one line. For experienced users, you are suggested to check the source code of those experiments and make changes as needed.

List of built-in experiments

  • E`JuliaRL_BasicDQN_CartPole`
  • E`JuliaRL_DQN_CartPole`
  • E`JuliaRL_PrioritizedDQN_CartPole`
  • E`JuliaRL_Rainbow_CartPole`
  • E`JuliaRL_IQN_CartPole`
  • E`JuliaRL_A2C_CartPole`
  • E`JuliaRL_A2CGAE_CartPole` (Thanks to @sriram13m)
  • E`JuliaRL_MAC_CartPole` (Thanks to @RajGhugare19)
  • E`JuliaRL_PPO_CartPole`
  • E`JuliaRL_VPG_CartPole` (Thanks to @norci)
  • E`JuliaRL_VPG_Pendulum` (continuous action space)
  • E`JuliaRL_VPG_PendulumD` (discrete action space)
  • E`JuliaRL_DDPG_Pendulum`
  • E`JuliaRL_TD3_Pendulum` (Thanks to @rbange)
  • E`JuliaRL_SAC_Pendulum` (Thanks to @rbange)
  • E`JuliaRL_PPO_Pendulum`
  • E`JuliaRL_BasicDQN_MountainCar` (Thanks to @felixchalumeau)
  • E`JuliaRL_DQN_MountainCar` (Thanks to @felixchalumeau)
  • E`JuliaRL_Minimax_OpenSpiel(tic_tac_toe)`
  • E`JuliaRL_TabularCFR_OpenSpiel(kuhn_poker)`
  • E`JuliaRL_DeepCFR_OpenSpiel(leduc_poker)`
  • E`JuliaRL_DQN_SnakeGame`
  • E`JuliaRL_BC_CartPole`
  • E`JuliaRL_BasicDQN_EmptyRoom`
  • E`Dopamine_DQN_Atari(pong)`
  • E`Dopamine_Rainbow_Atari(pong)`
  • E`Dopamine_IQN_Atari(pong)`
  • E`rlpyt_A2C_Atari(pong)`
  • E`rlpyt_PPO_Atari(pong)`

Run Experiments

julia> ] add ReinforcementLearning

julia> using ReinforcementLearning

julia> run(E`JuliaRL_BasicDQN_CartPole`)

julia> ] add ArcadeLearningEnvironment

julia> using ArcadeLearningEnvironment

julia> run(E`rlpyt_PPO_Atari(pong)`)  # the Atari environment is provided in ArcadeLearningEnvironment, so we need to install it first

Notes:

  • Experiments on CartPole usually run faster with CPU only due to the overhead of sending data between CPU and GPU.
  • It shouldn't surprise you that our experiments on CartPole are much faster than those written in Python. The secret is that our environment is written in Julia!
  • Remember to set JULIA_NUM_THREADS to enable multi-threading when using algorithms like A2C and PPO.
  • Experiments on Atari (OpenSpiel, SnakeGame, GridWorlds) are only available after you have ArcadeLearningEnvironment.jl (OpenSpiel.jl, SnakeGame.jl, GridWorlds.jl) installed and using ArcadeLearningEnvironment (using OpenSpiel, using SnakeGame, import GridWorlds).

Speed

  • Different configurations might affect the performance a lot. According to our tests, our implementations are generally comparable to those written in PyTorch or TensorFlow with the same configuration (sometimes we are significantly faster).

The following data are collected from experiments on Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz with a GPU card of RTX 2080ti.

Experiment FPS Notes
E`Dopamine_DQN_Atari(pong)` ~210 Use the same config of dqn.gin in google/dopamine
E`Dopamine_Rainbow_Atari(pong)` ~171 Use the same config of rainbow.gin in google/dopamine
E`Dopamine_IQN_Atari(pong)` ~162 Use the same config of implicit_quantile.gin in google/dopamine
E`rlpyt_A2C_Atari(pong)` ~768 Use the same default parameters of A2C in rlpyt with 4 threads
E`rlpyt_PPO_Atari(pong)` ~711 Use the same default parameters of PPO in rlpyt with 4 threads

reinforcementlearningzoo.jl's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

reinforcementlearningzoo.jl's Issues

A possible error

I found a possible error:
In rainbow.jl, calculating the error of lines 177 to 179, the logits from the online network don't use the "softmax" operation.

How to run experiment

Sorry if this is trivial. But how can I run experiment?
I tried to:

  1. clone this repo
  2. Start julia inside this repo and activate the project
  3. Then I try to run experiment with:
run(E`JuliaRL_BasicDQN_CartPole`)

But I get:

ERROR: LoadError: UndefVarError: @E_cmd not defined

What am I missing?

Improve the PPOLearner to support continuous action space

Necessary changes:

  1. Add a field of dist into the PPOLearner (just like @norci did in VPG )

  2. Following method needs to be extended to recognize environments with continuous action space. Currently the PPOLearner is assumed to return a (batch of ) logits. I'd suggest to rename the PPOLearner into PPOPolicy and return an action directly.

    function (learner::PPOLearner)(env::MultiThreadEnv)

  3. GausianNetwork is also needed.

  4. Calculating entropy loss in update! is hard coded. Better to split it into a function to support continuous distribution (or reuse the one in StatsBase or Distributions. But use them with caution! I had some problems using them with Zygote before)

TD3 Implementation

Hello,

TD3 is mentioned in the README, but I can't seem to find it anywhere in the source. Is it implemented?

Along the same lines, I tried the JuliaRL_TD3_Pendulum experiment from the docs and it looks like it is missing:

julia> using ReinforcementLearning

julia> run(E`JuliaRL_TD3_Pendulum`)
ERROR: LoadError: MethodError: no method matching Experiment(::Val{:JuliaRL}, ::Val{:TD3}, ::Val{:Pendulum}, ::Nothing)
Closest candidates are:
  Experiment(::Any, ::Any, ::Any, ::Any, ::String) at /Users/rlee18/.julia/packages/ReinforcementLearningCore/xwt8K/src/core/experiment.jl:7
  Experiment(::Any, ::Any, ::Any, ::Any, ::Any) at /Users/rlee18/.julia/packages/ReinforcementLearningCore/xwt8K/src/core/experiment.jl:7
  Experiment(::Val{:JuliaRL}, ::Val{:DDPG}, ::Val{:Pendulum}, ::Nothing; save_dir, seed) at /Users/rlee18/.julia/packages/ReinforcementLearningZoo/uxBP8/src/experiments/rl_envs.jl:677
  ...
Stacktrace:
 [1] Experiment(::String) at /Users/rlee18/.julia/packages/ReinforcementLearningCore/xwt8K/src/core/experiment.jl:32
 [2] @E_cmd(::LineNumberNode, ::Module, ::Any) at /Users/rlee18/.julia/packages/ReinforcementLearningCore/xwt8K/src/core/experiment.jl:25
in expression starting at REPL[2]:1

Thanks!

Optimizer setting in PPO experiments

Hi! I'm using the PPO implementation for my custom environment with continuous action space. I built my custom experiments based on the PPO pendulum experiment template, where the actor and critic are defined explicitly with optimizer=ADAM(3e-4). After playing with it for a while, I realized that I have to use the optimizer defined as part of the ActorCritic type if I want to change the learning rate, etc. It looks the optimizers defined for actor and critic are not used, so it would be less confusing if the optimizer is specified only for the ActorCritic call in the template.

JuliaRL_A2C_CartPole experiment hook TotalRewardPerEpisode not work

This is my script for running:

id = "JuliaRL_A2C_CartPole"
e = Experiment(id)
agent = e.policy
env = e.env
stop_condition = e.stop_condition
hook = TotalRewardPerEpisode()
run(agent, env, stop_condition, hook)
rewards = hook.rewards

and it gives me error as:

LoadError: MethodError: no method matching +(::Float64, ::Vector{Float32})
For element-wise addition, use broadcasting with dot syntax: scalar .+ array

And I found that the error is from line 139 in hook.jl:

function (hook::TotalRewardPerEpisode)(::PostActStage, agent, env)
hook.reward += reward(env)
end

the reward(env) returns a vector:
Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

I didn't change any code, and I just use the newest package by add ReinforcementLearning#master

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

install error?

I have installed fine on my stationary computer, but I wanted to try using a GPU using nextjournal.com

so on an instance with Julia 1.3.1 I try to install reinforcement

Pkg.add("ReinforcementLearning")

(I also add Flux just in case due to the error below)

The when using ReinforcementLearning I get:

ERROR: LoadError: LoadError: UndefVarError: TrackedArray not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:13
 [2] top-level scope at /root/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
 [3] include at ./boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1105
 [5] include at ./Base.jl:31 [inlined]
 [6] include(::String) at /root/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
 [7] top-level scope at /root/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38
 [8] include at ./boot.jl:328 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1105
 [10] include(::Module, ::String) at ./Base.jl:31
 [11] top-level scope at none:2
 [12] eval at ./boot.jl:330 [inlined]
 [13] eval(::Expr) at ./client.jl:425
 [14] top-level scope at ./none:3
in expression starting at /root/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
in expression starting at /root/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38 ```


P.S. I am very excited about this package! it's so clean and nicely structured!

JuliaRL_PPO_Pendulum action_space not defined error

julia> using ReinforcementLearning

julia> run(E`JuliaRL_PPO_Pendulum`)
ERROR: LoadError: UndefVarError: action_space not defined
Stacktrace:
 [1] Experiment(::Val{:JuliaRL}, ::Val{:PPO}, ::Val{:Pendulum}, ::Nothing; save_dir::Nothing, seed::Int64)
   @ ReinforcementLearningZoo ~/.julia/packages/ReinforcementLearningZoo/ma4P7/src/experiments/rl_envs/JuliaRL_PPO_Pendulum.jl:17
 [2] Experiment(::Val{:JuliaRL}, ::Val{:PPO}, ::Val{:Pendulum}, ::Nothing)
   @ ReinforcementLearningZoo ~/.julia/packages/ReinforcementLearningZoo/ma4P7/src/experiments/rl_envs/JuliaRL_PPO_Pendulum.jl:9
 [3] Experiment(s::String)
   @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/NWrFY/src/core/experiment.jl:35
 [4] var"@E_cmd"(__source__::LineNumberNode, __module__::Module, s::Any)
   @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/NWrFY/src/core/experiment.jl:25
in expression starting at REPL[2]:1

Running Pendulum with some other algorithm or running PPO with some other environment seems to work. I can't really see where the problem is but thought you would like to know something is amiss.

Need RLCore's @E_cmd to run experiment as given in README

According to README, we only need to be load ReinforcementLearingZoo and ReinforcementLearningEnvironments. But I think we also need to load ReinforcementLearingCore to be able to run single line experiments since macro E_cmd is defined in ReinforcementLearningCore

sid dev-RLZoo-GridWorlds $ julia --project=.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using ReinforcementLearningZoo

julia> using ReinforcementLearningEnvironments

julia> run(E`JuliaRL_BasicDQN_CartPole`)
ERROR: LoadError: UndefVarError: @E_cmd not defined
in expression starting at REPL[3]:1

julia> 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.