juliareinforcementlearning / reinforcementlearningzoo.jl Goto Github PK

Home Page: https://juliareinforcementlearning.org/

License: MIT License

Julia 100.00%

iqn ddpg ppo julia reinforcement-learning dqn julia-language c51 rainbow a2c td3 cfr counterfactual-regret-minimization machine-learning sac hacktoberfest

reinforcementlearningzoo.jl's Issues

Add policy gradient

Doc:

https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf

Implementations:

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

JuliaRL_PPO_Pendulum action_space not defined error

julia> using ReinforcementLearning

julia> run(E`JuliaRL_PPO_Pendulum`)
ERROR: LoadError: UndefVarError: action_space not defined
Stacktrace:
 [1] Experiment(::Val{:JuliaRL}, ::Val{:PPO}, ::Val{:Pendulum}, ::Nothing; save_dir::Nothing, seed::Int64)
   @ ReinforcementLearningZoo ~/.julia/packages/ReinforcementLearningZoo/ma4P7/src/experiments/rl_envs/JuliaRL_PPO_Pendulum.jl:17
 [2] Experiment(::Val{:JuliaRL}, ::Val{:PPO}, ::Val{:Pendulum}, ::Nothing)
   @ ReinforcementLearningZoo ~/.julia/packages/ReinforcementLearningZoo/ma4P7/src/experiments/rl_envs/JuliaRL_PPO_Pendulum.jl:9
 [3] Experiment(s::String)
   @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/NWrFY/src/core/experiment.jl:35
 [4] var"@E_cmd"(__source__::LineNumberNode, __module__::Module, s::Any)
   @ ReinforcementLearningCore ~/.julia/packages/ReinforcementLearningCore/NWrFY/src/core/experiment.jl:25
in expression starting at REPL[2]:1

Running Pendulum with some other algorithm or running PPO with some other environment seems to work. I can't really see where the problem is but thought you would like to know something is amiss.

JuliaRL_A2C_CartPole experiment hook TotalRewardPerEpisode not work

This is my script for running:

id = "JuliaRL_A2C_CartPole"
e = Experiment(id)
agent = e.policy
env = e.env
stop_condition = e.stop_condition
hook = TotalRewardPerEpisode()
run(agent, env, stop_condition, hook)
rewards = hook.rewards

and it gives me error as:

LoadError: MethodError: no method matching +(::Float64, ::Vector{Float32})
For element-wise addition, use broadcasting with dot syntax: scalar .+ array

And I found that the error is from line 139 in hook.jl:

function (hook::TotalRewardPerEpisode)(::PostActStage, agent, env)
hook.reward += reward(env)
end

the reward(env) returns a vector:
Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

I didn't change any code, and I just use the newest package by add ReinforcementLearning#master

Optimizer setting in PPO experiments

Hi! I'm using the PPO implementation for my custom environment with continuous action space. I built my custom experiments based on the PPO pendulum experiment template, where the actor and critic are defined explicitly with optimizer=ADAM(3e-4). After playing with it for a while, I realized that I have to use the optimizer defined as part of the ActorCritic type if I want to change the learning rate, etc. It looks the optimizers defined for actor and critic are not used, so it would be less confusing if the optimizer is specified only for the ActorCritic call in the template.

Use StableRNG in tests

https://github.com/JuliaRandom/StableRNGs.jl

Mean Actor-Critic

https://arxiv.org/pdf/1709.00503.pdf

Add PPO

Ref:

Please create new issues in ReinforcementLearning.jl for better retrieval

https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/issues

install error?

I have installed fine on my stationary computer, but I wanted to try using a GPU using nextjournal.com

so on an instance with Julia 1.3.1 I try to install reinforcement

Pkg.add("ReinforcementLearning")

(I also add Flux just in case due to the error below)

The when using ReinforcementLearning I get:

ERROR: LoadError: LoadError: UndefVarError: TrackedArray not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./Base.jl:13
 [2] top-level scope at /root/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
 [3] include at ./boot.jl:328 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1105
 [5] include at ./Base.jl:31 [inlined]
 [6] include(::String) at /root/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:1
 [7] top-level scope at /root/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38
 [8] include at ./boot.jl:328 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1105
 [10] include(::Module, ::String) at ./Base.jl:31
 [11] top-level scope at none:2
 [12] eval at ./boot.jl:330 [inlined]
 [13] eval(::Expr) at ./client.jl:425
 [14] top-level scope at ./none:3
in expression starting at /root/.julia/packages/ReinforcementLearning/qSdCS/src/learner/dqn.jl:70
in expression starting at /root/.julia/packages/ReinforcementLearning/qSdCS/src/ReinforcementLearning.jl:38 ```


P.S. I am very excited about this package! it's so clean and nicely structured!

A possible error

I found a possible error:
In rainbow.jl, calculating the error of lines 177 to 179, the logits from the online network don't use the "softmax" operation.

Add IQN

I think this package can be a superset of google/dopamine after implementing IQN 🏃 (should be ready in the next week)? And maybe stable-baselines the next step?

Provide artifacts of pretrained models

Pretrained agent is supposed be loaded with Agent(artifact"EnvType_envname_Method_version_role")

Implement Soft Actor Critic

Support legal_actions and legal_actions_mask

Add a buffer which contains legal_actions
Support legal_actions automatically in existing algorithms:
- BasicDQN
- DQN
- PrioritizedDQN
- Rainbow
- IQN

Need RLCore's @E_cmd to run experiment as given in README

According to README, we only need to be load ReinforcementLearingZoo and ReinforcementLearningEnvironments. But I think we also need to load ReinforcementLearingCore to be able to run single line experiments since macro E_cmd is defined in ReinforcementLearningCore

sid dev-RLZoo-GridWorlds $ julia --project=.
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using ReinforcementLearningZoo

julia> using ReinforcementLearningEnvironments

julia> run(E`JuliaRL_BasicDQN_CartPole`)
ERROR: LoadError: UndefVarError: @E_cmd not defined
in expression starting at REPL[3]:1

julia>

Improve the PPOLearner to support continuous action space

Necessary changes:

Add a field of dist into the PPOLearner (just like @norci did in VPG )
Following method needs to be extended to recognize environments with continuous action space. Currently the PPOLearner is assumed to return a (batch of ) logits. I'd suggest to rename the PPOLearner into PPOPolicy and return an action directly.

ReinforcementLearningZoo.jl/src/algorithms/policy_gradient/ppo.jl

Line 77 in 34aea90

function (learner::PPOLearner)(env::MultiThreadEnv)
GausianNetwork is also needed.
Calculating entropy loss in update! is hard coded. Better to split it into a function to support continuous distribution (or reuse the one in StatsBase or Distributions. But use them with caution! I had some problems using them with Zygote before)

juliareinforcementlearning / reinforcementlearningzoo.jl Goto Github PK

reinforcementlearningzoo.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org