Code Monkey home page Code Monkey logo

Comments (7)

fferreres avatar fferreres commented on May 13, 2024

Policy(state) returns a probabilities array, which is not an action but a distribution over possible actions. The player can't make any action past 21 when it has no usable ace because the game is finished...Done. The agent can't make choices, not it can change any value, get new cards, etc.
The value of getting 22, or larger, is already factored into the value of doing the action of asking another card in previous steps, averaged as the expectation of Q of the actin to get another card - this is reflected in those actions. There is nothing to report beyond 21 because those states are terminal, and thus nothing can be improved - only the action that lead to > 21 can be optimized.

from reinforcement-learning.

dennybritz avatar dennybritz commented on May 13, 2024

Basically what fferreres said. It's definitely possible that there's a problem with the solution, but right now and I don't see it and it looks right to me. Closing this for now (feel free to re-open and elaborate more if you still think it's wrong)

from reinforcement-learning.

seanxwh avatar seanxwh commented on May 13, 2024

thanks @fferreres and @dennybritz for your time to take a look and explain. However, I'm still wondering why the no-usable-ace cases doesn't have anything about user getting busted and a negative reward, is it because the graphs are clipped?

from reinforcement-learning.

dennybritz avatar dennybritz commented on May 13, 2024

The plot shows the value function for all states, but there is no state where the player is beyond 21 points because the game will have ended by then.

from reinforcement-learning.

seanxwh avatar seanxwh commented on May 13, 2024

I see, thanks for helping :)

from reinforcement-learning.

fferreres avatar fferreres commented on May 13, 2024

I think I also remember (in this implementation of the environment) that when a player has a usable ace and the player goes over 21, the code itself updates the value of the player's hand to -10 (eg. 23 becomes 13) and changes the state of the Usable Ace state variable to False, so results of Ace and No Ace are about best next action conditioned to you still having or not having an Ace. This is also why a player with a Usable Ace CAN go over 21, but at the same time will never reach of being over 21.

from reinforcement-learning.

seanxwh avatar seanxwh commented on May 13, 2024

@fferreres ya that is right, that is why somehow I saw the negative drop in my simulation only for no-usable-one cases

from reinforcement-learning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.