Code Monkey home page Code Monkey logo

Comments (6)

Svalorzen avatar Svalorzen commented on July 3, 2024

Strange, it works here. Also, line 341 in the original file is

args = parser.parse_args()

And not the solve_mdp line. Sure you didn't change anything in the file?

from ai-toolbox.

Svalorzen avatar Svalorzen commented on July 3, 2024

As a sanity check, you could try to add at line 264 (in the loop that constructs the probabilities) the following:

    for state in range(len(S)):
        coord = decodeState(state)
        T.append([[getTransitionProbability(coord, action,
                                            decodeState(next_state))
                   for next_state in range(len(S))] for action in A])
        print([sum(x) for x in T[-1]]) # <-------------------- ADD THIS LINE

The last line will perform the sums for you; all printed numbers should be 1.0, otherwise something is going wrong.

from ai-toolbox.

troyrock avatar troyrock commented on July 3, 2024

from ai-toolbox.

Svalorzen avatar Svalorzen commented on July 3, 2024

Ah, I see. Something is going wrong in the creation of the transition function. What the Python code in the example is doing is creating a 3-dimensional matrix, with dimensions SxAxS, where each entry T[s][a][s'] corresponds to the probability of transitioning from state s to state s', given action a.

What this means is that, to be correct probability distributions, all these numbers must be valid probabilities (between 0 and 1), and that the sum for all possible transitions from T[s][a] must be 1.0. The error is related to the fact that these assumptions are not true, so the program aborts. The prints are to check that all these sums are indeed 1.0, and in your case they are not, which explains it.

That said, I think I found the problem. It's a subtler change on how integer divisions between Python 2 and Python 3. I'm going to do a patch soon that makes the example more "cross-compatible" against both versions.

As a quick fix, you need to replace the decodeState(state) function in your example file with:

def decodeState(state):
    """
    Convert from state_index to coordinate.

    Parameters
    ----------
    state: int
        Index of the state.

    Returns
    -------
    coord: tuple of int
        Four element tuple containing the position of the tiger and antelope.
    """
    coord = []
    for _ in range(4):
        c = state % SQUARE_SIZE
        state = state // SQUARE_SIZE   # This is the changed line that forces an integer division in Python 3.
        coord.append(c)
    return tuple(coord)

Regarding your problem, there's no direct way to do this "cleanly" without modifying the internals of the library (depending on which algorithm you plan to use).

The best solution that will maintain compatibility with everything else is to simply use the same transitions for all actions in your particular states. So, for all actions, the transition probabilities will be the same. This makes it so that picking the action does not affect the environment, which is what you want. This will allow planning algorithms to work correctly (like for example value iteration).

If you plan to use reinforcement learning (for example Q-learning), then you might also want to force your agent to pick a specific action (say, 0) in those states, as it will prevent unnecessary exploration in states where picking an action does not do anything.

Let me know if what I wrote makes sense to you :)

from ai-toolbox.

troyrock avatar troyrock commented on July 3, 2024

from ai-toolbox.

Svalorzen avatar Svalorzen commented on July 3, 2024

Sure, that'd be cool! I love to see what people are doing with the library :)

If the example works now feel free to close the issue; if you then have more trouble with your setting just open another one no problem. Good luck for now!

from ai-toolbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.