Comments (7)
Policy(state) returns a probabilities array, which is not an action but a distribution over possible actions. The player can't make any action past 21 when it has no usable ace because the game is finished...Done. The agent can't make choices, not it can change any value, get new cards, etc.
The value of getting 22, or larger, is already factored into the value of doing the action of asking another card in previous steps, averaged as the expectation of Q of the actin to get another card - this is reflected in those actions. There is nothing to report beyond 21 because those states are terminal, and thus nothing can be improved - only the action that lead to > 21 can be optimized.
from reinforcement-learning.
Basically what fferreres said. It's definitely possible that there's a problem with the solution, but right now and I don't see it and it looks right to me. Closing this for now (feel free to re-open and elaborate more if you still think it's wrong)
from reinforcement-learning.
thanks @fferreres and @dennybritz for your time to take a look and explain. However, I'm still wondering why the no-usable-ace cases doesn't have anything about user getting busted and a negative reward, is it because the graphs are clipped?
from reinforcement-learning.
The plot shows the value function for all states, but there is no state where the player is beyond 21 points because the game will have ended by then.
from reinforcement-learning.
I see, thanks for helping :)
from reinforcement-learning.
I think I also remember (in this implementation of the environment) that when a player has a usable ace and the player goes over 21, the code itself updates the value of the player's hand to -10 (eg. 23 becomes 13) and changes the state of the Usable Ace state variable to False, so results of Ace and No Ace are about best next action conditioned to you still having or not having an Ace. This is also why a player with a Usable Ace CAN go over 21, but at the same time will never reach of being over 21.
from reinforcement-learning.
@fferreres ya that is right, that is why somehow I saw the negative drop in my simulation only for no-usable-one cases
from reinforcement-learning.
Related Issues (20)
- Why CliffWalkingEnv returns 'is_done=True' when reaching cliff? HOT 2
- Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? HOT 1
- Why is Chapter 11 excluded? HOT 2
- why DQN use kernel size 8 ?
- Gambler's Problem: 0 Stake Allowed?
- Some question in MC Control with Epsilon-Greedy Policies Solution.ipynb HOT 2
- DQL size error
- Policy Evaluation Exercise Solution Is Wrong HOT 1
- Monte Carlo AssertionError: defaultdict(<function mc_control_importance_sampling.<locals>.<lambda> at 0x7f31699ffe18>, {}) (<class 'collections.defaultdict'>)
- Lecture Slides need an update
- Clarification on DQN testing rewards on Atari games
- DQN Testing Rewards on Atari Games HOT 1
- Reinforcement learning policy HOT 1
- Minor Link fix
- A small correction in "MDPs and Bellman Equations" section
- Typo in: "Model-Free Prediction & Control with Monte Carlo (MC)" section -> "Blackjack Playground.ipynb" file:
- Issue in: reinforcement-learning/MC/MC Prediction Solution.ipynb
- please provide requirements.txt or mention the exact version of packages used.
- demystifying-deep-reinforcement-learning link is broken
- MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reinforcement-learning.