Code Monkey home page Code Monkey logo

mpatacchiola / dissecting-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW
605.0 56.0 176.0 28.82 MB

Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog

Home Page: https://mpatacchiola.github.io/blog/

License: MIT License

Python 100.00%
reinforcement-learning deep-reinforcement-learning markov-chain temporal-differencing-learning sarsa q-learning actor-critic multi-armed-bandit inverted-pendulum mountain-car

dissecting-reinforcement-learning's People

Contributors

lucanicoliyt88 avatar mpatacchiola avatar ngacho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dissecting-reinforcement-learning's Issues

adding optimal policy calculation in the value iteration algorithm

you could add an optimal policy evaluation after generate_graph in the value iteration algorithm

https://mpatacchiola.github.io/blog/2016/12/09/dissecting-reinforcement-learning.html

    generate_graph(graph_list)

#optimal policy evaluation
    pi = np.zeros(12)
    for s in range(tot_states):
        v = np.zeros(tot_states)
        v[s] = 1.0
        pi[s] = return_expected_action(v, T, u)
    pi[5] = np.NaN
    pi[3] = pi[7] = -1
    print(pi)    
def return_expected_action(u, T, v):
    actions_array = np.zeros(4)
    for action in range(4):
         #Expected utility of doing a in state s, according to T and u.
         actions_array[action] = np.sum(np.multiply(u, np.dot(v, T[:,:,action])))
    return np.argmax(actions_array)

Part.1 Modified Policy Iteration with Simplified Bellman Equation and Linear Algebra Policy Evaluation Infinite Loop

Hello,

I am attempting to run the function "main_linalg()" in policy_iteration.py but the program fails to terminate.

The iterative policy evaluation with the standard policy iteration program returns the correct policy/

Screen Shot 2022-03-16 at 1 38 34 PM

After some investigation, I found that if you replace

u = return_policy_evaluation_linalg(p, r, T, gamma)

with

u = return_policy_evaluation(p, u, r, T, gamma)

in the function called main_linalg

What this does is that it changes the implementation to a modified policy iteration algorithm that uses iterative policy evaluation.
The changes cause the program to terminate after 4 to 5 iterations.
However, the program returns a different policy than the expected.

Screen Shot 2022-03-16 at 1 39 03 PM

I did these changes because my initial thought was that the linear and iterative approaches were supposed to return the same utility values for each state. Do you know if this is truly the case?

I found another Github https://github.com/SparkShen02/MDP-with-Value-Iteration-and-Policy-Iteration
that implements the modified policy iteration algorithm that uses iterative policy evaluation.

Screen Shot 2022-03-16 at 1 51 42 PM

Although you use padding in your transitional matrix generator to account for boundary collisions, I suspect the linear algebra approach fails to detect wall boundary collisions which causes the optimal action to switch between it and an action that causes a wall collision.

I am not sure how to proceed. Please look into this for a possible fix. Thank you.

Problem in executing: "Montecarlo_control.py"

Dear Massimiliano,

I am trying to execute your code "Montecarlo_control.py" from post number 2.

I have got the following issue:
image

it seems that if(checkup_matrix[row, col] == 0): receives a row index that is a float and not a int value.
In this way it is not able to search index of the table.

Luca

Alternative to Numpy

I would like to try your code on the pyboard and the OpenMV boards. Unfortunately, Numpy is huge so it cannot be installed on a microcontroller. Will it be possible to using list, a bytearray, or an array.array; to implement the functions you are using from numpy?

Part 3, TD(lambda): trace_matrix should be reset to zeroes at the beginning of each epoch

I believe that in part 3, TD(lambda), the trace_matrix should be reset to zeros at the beginning of each epoch. Otherwise the utility of a state may be updated even if the state is not part of the current trace.

Also, I believe that the decay of the trace_matrix should be moved to just before the line:
trace_matrix[observation[0], observation[1]] += 1

The clean robot example on chapter 1 ?

Hello, I really don't understand this example in chapter one:
image
why the robot begin at state(1,1) and takes up (or down, left, right) action but have 3 subsequent states like that.
Thank you.

11X11 grid

Hi @mpatacchiola i have 11X11 grid so how can i make transition_matrix
can you please help me?
Is there any generic code for creating transition_matrix by giving row and col of grid?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.