amiithinks / driving_gridworld Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 219 KB

Pycolab-based driving gridworld for experimenting with algorithms for learning safe vehicle driving.

License: MIT License

Makefile 1.01% Python 98.99%

driving_gridworld's People

Contributors

Stargazers

Watchers

driving_gridworld's Issues

Add `Road#is_off_road`.

Line 171 computes the value that should be returned by this method.

Make `RoadPycolabEnv#to_road` return the actual current state as a `Road`.

Discrepancy between the default reward function `rewards/reward` and the default probability transition matrix in the no crashing case.

Consider the default setup (as outlined in exe/road.py), where allow_crashing is set to false. Consider a situation where the car is in column 0 going at speed 2, and there's an obstacle at (0, 0).

If I go LEFT here, the probability transition matrix considers this action to be equivalent to CRUISE | NO_OP, but the reward function considers this action to reduce the forward movement by 1, which isn't exactly CRUISE.

Add an allowed column appearance set to `Obstacles`

By default, the set should be all the columns, but we want to be able to set particular Obstacles to only appear particular lanes, e.g. the left lane, the ditch lanes, the right lane, etc.

Add noise to rewards.

Thinking about it more, the randomness from state dynamics will only rarely impact the rewards, so Mike is right, we should add some randomness.

For Obstacles, you can Obstacle#noisy_reward_for_collision(speed) and stddev=0 constructor argument to Obstacle. Obstacle#noisy_reward_for_collision(speed) should return the base (expected) reward for colliding at the given speed plus white noise (a sample from a zero-mean normal) with standard deviation proportional to self.stddev. I think it makes sense to multiply the standard deviation by the speed since driving faster probably does make the result of collisions more variable.

You'll also have to add a stddev=0 argument to Road with which to add white noise to the reward for driving off-road. I think it makes sense to again multiply the standard deviation by the speed.

I think it makes sense to not add noise to the reward from Car for making progress toward its destination or to the -1 baseline reward.

Have obstacles check if they should appear independently in each revealed row.

Right now, the probability that an obstacle appears on the next step is the obstacle's fixed prob_of_appearing regardless of how fast the car is traveling. But that means that the car will, in expectation, encounter fewer obstacles over the same distance if it's driving faster. This both makes the simulator less realistic and allows agents to, potentially, avoid encountering obstacles by taking advantage of this simulator quirk.

Instead, prob_of_appearing should represent the probability of appearing in a single revealed row. So if the car is driving at speed 2, then we should check if the obstacle appears in either revealed rows on the horizon, where the probability of appearing on each check is prob_of_appearing. The order of the rows when doing these repeated checks should be from largest/closest to the car to smallest/farthest away so that the probabilities of obstacles appearing in each row match regardless of speed.

Add a disappear probability to obstacles

The interfaces to set and use this probability should resemble the probability of appearing.

Rename references to rows like `num_rows` to `headlight_range` to better reflect domain-specific terminology.

Allow the car to crash into the barrier

Rather than capping the car's column, we should set the discount to 0 and end the game when the car moves left or right into the barrier beyond the ditch. The reward should be -1 / (1 - gamma) to simulate crashing and experiencing a -1 reward forever.

Remove `pycolab` except for human UI.

Just need to implement my own game engine with:

An its_showtime method that returns observation, reward, discount.
A game_over property.
A play(action) method that returns observation, reward, discount.
A the_plot property that is a dict.

Add a car obstacle that drives down at a fixed speed

So as not to force Road to deal with a car obstacle as a special case, I would add a speed parameter and interface to Obstacle, then all obstacles can move. The default speed for Bump and Pedestrian should be zero, and one for the car obstacle.

Move reward logic out of `Obstacle`s and `Road`.

Road should take a reward function as an argument. This reward function should take the current road, the next road, and the list of obstacles involved in a collision with the car as arguments.

The default reward function should be the one that we're using now. All the reward logic in Obstacles and Road should be moved into this default reward function.

The standard deviation argument for Road should also be removed since it can now simply be included in the reward function's inner logic.

Enforce a hard limit on the speed limit in `Road` to the number of rows + 1

If the speed limit is larger than this, then the physical plausibility of the similar breaks, because the number of possible obstacle encounters across a fixed distance can depend on the car's speed and the range of its headlights (the number of rows).

import pickle


def save(self, data, path):
        with open(path, 'wb') as f:
            pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

Loading the file can then simply be done with pickle.load(f).

amiithinks / driving_gridworld Goto Github PK

driving_gridworld's People

Contributors

Stargazers

Watchers

driving_gridworld's Issues

Recommend Projects

Recommend Topics

Recommend Org