Comments (5)
The reward in Pong, as in all the Atari games, is the same returned by the gym environment. In the case of Pong, it is +1 when the agent scores, -1 when the opponent scores.
Normally, the cumulative discounted reward J, printed after each epoch, starts from -21, and slowly improves until 21. I'm not really sure where you see reward of 0. Are you looking at the dataset returned by the evaluate?
from mushroom-rl.
Yes, I am looking at the dataset returned by evaluate and averaged by get_stats. I expect things to work as you described. Here a sample of the output. Thank you for your comment.
min_reward: 4.000000, max_reward: 4.000000, mean_reward: 4.000000, games_completed: 1 min_reward: 5.000000, max_reward: 5.000000, mean_reward: 5.000000, games_completed: 1 min_reward: 0.000000, max_reward: 0.000000, mean_reward: 0.000000, games_completed: 1 min_reward: -1.000000, max_reward: -1.000000, mean_reward: -1.000000, games_completed:1 min_reward: 3.000000, max_reward: 3.000000, mean_reward: 3.000000, games_completed: 1 min_reward: 5.000000, max_reward: 5.000000, mean_reward: 5.000000, games_completed: 1
from mushroom-rl.
So the dataset returned by evaluate contains all the steps. So it is natural to see many transitions with 0 reward.
From the results you posted, I see some weird behavior. The completed games are always only 1, that also explains why minimum, mean, and maximum rewards are the same. I suggest you to check the way you are doing the evaluation, e.g. check that the number of steps is sufficiently high.
from mushroom-rl.
My intention is to determine the winrate so I have to examine the cumulative reward one game at a time and run multiple games. There is nothing weird there. I Inserted a line of code into core.py to record what is happening point by point.
next_state, reward, absorbing, _ = self.mdp.step(action)
##Testing point by point
if reward != 0.0: print(reward, absorbing, flush=True)
##
self._episode_steps += 1
The results from couple sample games are here:
pygame 1.9.6 Hello from the pygame community. https://www.pygame.org/contribute.html -1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False 1.0 False -1.0 False -1.0 False 1.0 False -1.0 False -1.0 False 1.0 False 1.0 False -1.0 False 1.0 False 1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False 1.0 False 1.0 False 1.0 False 1.0 False -1.0 False 1.0 True min_reward: 3.000000, max_reward: 3.000000, mean_reward: 3.000000, games_completed: 1
-1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False -1.0 False -1.0 False -1.0 False 1.0 False -1.0 False -1.0 False -1.0 False 1.0 False -1.0 False 1.0 False 1.0 False 1.0 False -1.0 False 1.0 False 1.0 False 1.0 False -1.0 False 1.0 False 1.0 False 1.0 False -1.0 False 1.0 False 1.0 False -1.0 False 1.0 False -1.0 False 1.0 False -1.0 False -1.0 True min_reward: 0.000000, max_reward: 0.000000, mean_reward: 0.000000, games_completed: 1
If you add up the plus and minus ones you can verify that the sum disagrees with mean_reward by + or - 1. I suspect that in processing the dataset the last point of the game, when absorbing is True (an edge case), is not handled properly but I'm not expert enough to track it down.
from mushroom-rl.
Thanks for your feedback. It was actually a bug affecting the function compute_metrics
used in the atari experiment. It happens in some cases that, as you say, the reward of the last step is not counted. We fixed the bug in the dev
branch. We are currently working on an important new release with several functionalities, e.g. online plotting of results, saving and loading of agents. We will soon merge dev
branch in master
with all these new functionalities, included bug fixing.
Thanks again. I'll close this issue.
Best regards.
from mushroom-rl.
Related Issues (20)
- Can't install package HOT 4
- suspected memory leak HOT 8
- How to train an agent in one environment and use it on another slightly different envoirnment HOT 3
- dynaq agent HOT 1
- how to reproduce DQN nature paper? HOT 7
- compress frames HOT 2
- n_steps dqn performs worse. bug?
- support for new spaces HOT 2
- PPO for lunar lander [BUG] HOT 10
- Multi modal state support HOT 1
- Save and Load Agent for the Second Time HOT 2
- 'Taxi-v3' error: "ValueError: too many values to unpack (expected 4)" HOT 2
- TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool. HOT 2
- SAC postload optimizer for alpha HOT 2
- QLearning Can't Train On Episodes HOT 6
- Suggestion: rename episodes_length to compute_episodes_length
- Suggestion: Add median to compute_metrics
- [solvers/dynamic_programming] Use np.linalg.solve instead of np.inv HOT 2
- [requirements.txt] Missing requirement for OpenAI gym HOT 4
- [Categorical DQN/Rainbow] Inconsistent behavior of Categorical DQN for an even number of atoms
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mushroom-rl.