amiithinks / driving_gridworld Goto Github PK
View Code? Open in Web Editor NEWPycolab-based driving gridworld for experimenting with algorithms for learning safe vehicle driving.
License: MIT License
Pycolab-based driving gridworld for experimenting with algorithms for learning safe vehicle driving.
License: MIT License
Line 171 computes the value that should be returned by this method.
Consider the default setup (as outlined in exe/road.py), where allow_crashing
is set to false
. Consider a situation where the car is in column 0 going at speed 2, and there's an obstacle at (0, 0).
If I go LEFT
here, the probability transition matrix considers this action to be equivalent to CRUISE | NO_OP
, but the reward function considers this action to reduce the forward movement by 1, which isn't exactly CRUISE
.
By default, the set should be all the columns, but we want to be able to set particular Obstacle
s to only appear particular lanes, e.g. the left lane, the ditch lanes, the right lane, etc.
Thinking about it more, the randomness from state dynamics will only rarely impact the rewards, so Mike is right, we should add some randomness.
For Obstacle
s, you can Obstacle#noisy_reward_for_collision(speed)
and stddev=0
constructor argument to Obstacle
. Obstacle#noisy_reward_for_collision(speed)
should return the base (expected) reward for colliding at the given speed plus white noise (a sample from a zero-mean normal) with standard deviation proportional to self.stddev
. I think it makes sense to multiply the standard deviation by the speed since driving faster probably does make the result of collisions more variable.
You'll also have to add a stddev=0
argument to Road
with which to add white noise to the reward for driving off-road. I think it makes sense to again multiply the standard deviation by the speed.
I think it makes sense to not add noise to the reward from Car
for making progress toward its destination or to the -1
baseline reward.
Right now, the probability that an obstacle appears on the next step is the obstacle's fixed prob_of_appearing
regardless of how fast the car is traveling. But that means that the car will, in expectation, encounter fewer obstacles over the same distance if it's driving faster. This both makes the simulator less realistic and allows agents to, potentially, avoid encountering obstacles by taking advantage of this simulator quirk.
Instead, prob_of_appearing
should represent the probability of appearing in a single revealed row. So if the car is driving at speed 2, then we should check if the obstacle appears in either revealed rows on the horizon, where the probability of appearing on each check is prob_of_appearing
. The order of the rows when doing these repeated checks should be from largest/closest to the car to smallest/farthest away so that the probabilities of obstacles appearing in each row match regardless of speed.
The interfaces to set and use this probability should resemble the probability of appearing.
Rather than capping the car's column, we should set the discount to 0 and end the game when the car moves left or right into the barrier beyond the ditch. The reward should be -1 / (1 - gamma)
to simulate crashing and experiencing a -1 reward forever.
Just need to implement my own game engine with:
its_showtime
method that returns observation, reward, discount
.game_over
property.play(action)
method that returns observation, reward, discount
.the_plot
property that is a dict
.So as not to force Road
to deal with a car obstacle as a special case, I would add a speed parameter and interface to Obstacle
, then all obstacles can move. The default speed for Bump
and Pedestrian
should be zero, and one for the car obstacle.
Road
should take a reward function as an argument. This reward function should take the current road, the next road, and the list of obstacles involved in a collision with the car as arguments.
The default reward function should be the one that we're using now. All the reward logic in Obstacle
s and Road
should be moved into this default reward function.
The standard deviation argument for Road
should also be removed since it can now simply be included in the reward function's inner logic.
If the speed limit is larger than this, then the physical plausibility of the similar breaks, because the number of possible obstacle encounters across a fixed distance can depend on the car's speed and the range of its headlights (the number of rows).
Since the speed limit is the headlight range plus one, it will fit perfectly in a single column. We can use a green ^
character to denote a speed unit.
The human interface at least behaves strangely when driving at the speed limit. Add tests to check and fix the simulator.
Move code from lines 154 to 163 to this new method. You'll also have to create an updated Car
, as is done on line 125, and loop over obstacles in self._obstacles
.
Override both its_showtime
and play
methods to record a copy of each Road
, reward, and discount factor before returning the Observation
, reward, and discount from super
.
Once the Road
instance is turned into a simple data structure with the to_key
method, the sequence of experience data can be saved easily with the pickle
library:
import pickle
def save(self, data, path):
with open(path, 'wb') as f:
pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
Loading the file can then simply be done with pickle.load(f)
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.