The decision-making-under-uncertainty from juliaacademy

Subtype for State to improve compatiblity with Crux

Please see the commit below for suggested change and an explanation. My fork of this repository does not have a clean copy of the original, so I don't know of any convenient way to submit a PR myself. It's a very small change, so it should be easy to make regardless.

3acaec2

Thanks for making these tutorials! I used them to learn how to translate some research code from Python to Julia.

Simple routine example from economics

It would be great if you could use this machinery to solve a simple/routine example of an MDP from economics.
See discussion here.

I will copy/paste the example I created.
I tried to simplify my notation to follow the standard MDP notation: state/action/transition/reward/discount

# Parameters for the Neoclassical Growth Model (NGM)
α = 0.65; 
f(s;α=α) = (s)^α
a0(s)    = f(s)
min_s = eps(); max_s = 2.0; n_s = 100;
min_a = eps(); max_a = a0(max_s); n_a = 100; 

"states:"
states = range(min_s, max_s, length=n_s)
ceil_state(s) = states[searchsortedfirst(states, s)] # for discrete VFI

"actions:" 
valid_actions()  = range(min_a, max_a, length=n_a)        # possible actions any state.
valid_actions(s) = filter(<(a0(s)), valid_actions())      # valid actions @ state=s 

"Transition:" 
μ(s,a)      = f(s) - a      

"reward:" 
r(s,a) = log(a)

"discount:"
β = 0.95

using QuickPOMDPs: QuickMDP           #QuickMDP()
using POMDPModelTools: Deterministic
m = QuickMDP(
    states     = states,
    actions    = valid_actions,
    transition = (s, a) -> Deterministic( ceil_state( μ(s,a) ) ),
    reward     = (s, a) -> r(s,a),
    discount   = β
)

# DiscreteValueIteration: Both work fast enough
using DiscreteValueIteration
s1 = DiscreteValueIteration.SparseValueIterationSolver()
s2 = DiscreteValueIteration.ValueIterationSolver()
@time sol1 = solve(s1, m)
@time sol2 = solve(s2, m)
value(sol1, states[2]), action(sol1, states[2])
value(sol2, states[2]), action(sol2, states[2])

#rewrite MDP w/o ceil_state() in transition.
m = QuickMDP(
    states     = states,
    actions    = valid_actions,
    transition = (s, a) -> Deterministic( μ(s,a) ),
    reward     = (s, a) -> r(s,a),
    discount   = β
)

# LocalApproximationValueIteration works, but currently slow!
using GridInterpolations
using LocalFunctionApproximation
using LocalApproximationValueIteration
grid = GridInterpolations.RectangleGrid(states, valid_actions(), ) #[0.0, 1.0]) 
interp = LocalFunctionApproximation.LocalGIFunctionApproximator(grid)
s4 = LocalApproximationValueIterationSolver(interp)
@time sol4 = solve(s4, m)
#190.477798 seconds (2.37 G allocations: 82.610 GiB, 6.87% gc time, 0.21% compilation time)
value(sol4, states[2]), action(sol4, states[2])

Now let's compare solutions from various MDP solvers w/ closed-forms:

using Plots
ω = α*β
A1 = α/(1.0-ω)
A0 = ((1-ω)*log(1-ω) + ω*log(ω))/((1-ω)*(1-β))

# Value
plot(legend=:bottomright, title="Value Functions");
plot!(states[2:end], i->A0 + A1 * log(i), lab="closed form") # way off 
plot!(states[2:end], i->value(sol1, i),   lab="sol1")
plot!(states[2:end], i->value(sol2, i),   lab="sol2")
plot!(states[2:end], i->value(sol4, i),   lab="sol4")
# Policy
plot(legend=:bottomright, title="Policy Functions");
plot!(states[2:end], i -> (1-ω)*(i^α), lab="closed form") # way off 
plot!(states[2:end], i -> action(sol1, i),   lab="sol1")
plot!(states[2:end], i -> action(sol2, i),   lab="sol2")
plot!(states[2:end], i -> action(sol4, i),   lab="sol4")
# Simulation
Tsim=150; s0=states[2]; 

simcf = []; push!(simcf, s0); 
for tt in 1:Tsim
    tt==1 ? s = s0 : nothing 
    a = (1-ω)*(s^α)
    sp = μ(s,a) 
    #sp = ceil_state(sp)
    push!(simcf, sp)
    s = sp
end 


sim1 = []; push!(sim1, s0); 
for tt in 1:Tsim
    tt==1 ? s = s0 : nothing 
    #a = valid_actions()[sol1.policy[searchsortedfirst(states, s)]]
    a = action(sol1, s)
    sp = μ(s,a) 
    sp = ceil_state(sp)
    push!(sim1, sp)
    s = sp
end 

sim4 = []; push!(sim4, s0); 
for tt in 1:Tsim
    tt==1 ? s = s0 : nothing 
    a = action(sol4, s)
    sp = μ(s,a) 
    push!(sim4, sp)
    s = sp
end 

plot(legend=:bottomright, title="Simulation");
plot!(simcf,   lab="closed form")
plot!(sim1,   lab="sol1")
plot!(sim4,   lab="sol4")

Creating a Project.toml and Manifest.toml to clone environment

Hello. I just though it might be nice to include a Project.toml and Manifest.toml in the root of the repo. That way any dependencies for the notebooks are resolved once the environment is instantiated. Just a suggestion.

Looking forward to the rest of the videos.

juliaacademy / decision-making-under-uncertainty Goto Github PK

decision-making-under-uncertainty's People

Contributors

Stargazers

Watchers

Forkers

decision-making-under-uncertainty's Issues

Subtype for State to improve compatiblity with Crux

Simple routine example from economics

Creating a Project.toml and Manifest.toml to clone environment

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent