Code Monkey home page Code Monkey logo

driving_gridworld's People

Contributors

daveloui avatar dmorrill10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

driving_gridworld's Issues

Discrepancy between the default reward function `rewards/reward` and the default probability transition matrix in the no crashing case.

Consider the default setup (as outlined in exe/road.py), where allow_crashing is set to false. Consider a situation where the car is in column 0 going at speed 2, and there's an obstacle at (0, 0).

If I go LEFT here, the probability transition matrix considers this action to be equivalent to CRUISE | NO_OP, but the reward function considers this action to reduce the forward movement by 1, which isn't exactly CRUISE.

Add noise to rewards.

Thinking about it more, the randomness from state dynamics will only rarely impact the rewards, so Mike is right, we should add some randomness.

For Obstacles, you can Obstacle#noisy_reward_for_collision(speed) and stddev=0 constructor argument to Obstacle. Obstacle#noisy_reward_for_collision(speed) should return the base (expected) reward for colliding at the given speed plus white noise (a sample from a zero-mean normal) with standard deviation proportional to self.stddev. I think it makes sense to multiply the standard deviation by the speed since driving faster probably does make the result of collisions more variable.

You'll also have to add a stddev=0 argument to Road with which to add white noise to the reward for driving off-road. I think it makes sense to again multiply the standard deviation by the speed.

I think it makes sense to not add noise to the reward from Car for making progress toward its destination or to the -1 baseline reward.

Have obstacles check if they should appear independently in each revealed row.

Right now, the probability that an obstacle appears on the next step is the obstacle's fixed prob_of_appearing regardless of how fast the car is traveling. But that means that the car will, in expectation, encounter fewer obstacles over the same distance if it's driving faster. This both makes the simulator less realistic and allows agents to, potentially, avoid encountering obstacles by taking advantage of this simulator quirk.

Instead, prob_of_appearing should represent the probability of appearing in a single revealed row. So if the car is driving at speed 2, then we should check if the obstacle appears in either revealed rows on the horizon, where the probability of appearing on each check is prob_of_appearing. The order of the rows when doing these repeated checks should be from largest/closest to the car to smallest/farthest away so that the probabilities of obstacles appearing in each row match regardless of speed.

Allow the car to crash into the barrier

Rather than capping the car's column, we should set the discount to 0 and end the game when the car moves left or right into the barrier beyond the ditch. The reward should be -1 / (1 - gamma) to simulate crashing and experiencing a -1 reward forever.

Remove `pycolab` except for human UI.

Just need to implement my own game engine with:

  • An its_showtime method that returns observation, reward, discount.
  • A game_over property.
  • A play(action) method that returns observation, reward, discount.
  • A the_plot property that is a dict.

Add a car obstacle that drives down at a fixed speed

So as not to force Road to deal with a car obstacle as a special case, I would add a speed parameter and interface to Obstacle, then all obstacles can move. The default speed for Bump and Pedestrian should be zero, and one for the car obstacle.

Move reward logic out of `Obstacle`s and `Road`.

Road should take a reward function as an argument. This reward function should take the current road, the next road, and the list of obstacles involved in a collision with the car as arguments.

The default reward function should be the one that we're using now. All the reward logic in Obstacles and Road should be moved into this default reward function.

The standard deviation argument for Road should also be removed since it can now simply be included in the reward function's inner logic.

Add a CLI option to `exe/road.py` to save experience data

Once the Road instance is turned into a simple data structure with the to_key method, the sequence of experience data can be saved easily with the pickle library:

import pickle


def save(self, data, path):
        with open(path, 'wb') as f:
            pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)

Loading the file can then simply be done with pickle.load(f).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.