Code Monkey home page Code Monkey logo

pytorch-ppuu's Introduction

Prediction and Policy-learning Under Uncertainty (PPUU)

Gitter chatroom, video summary, slides, poster, website.
Implementing Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic in PyTorch.

planning

The objective is to train an agent (pink brain drawing) who's going to plan its own trajectory in a densely (stochastic) traffic highway. To do so, it minimises a few costs over trajectories unrolled while interacting with a world model (blue world drawing). We need to start, then, by training the world model with observational data from the real world (Earth's photo), which needs to be downloaded from the Internet.

Getting the real data

To get started, you need to fetch the real world data. Go to this address, and download the TGZ file (330 MB) on your machine. Open a terminal, go to the location where you've downloaded the file, and type:

tar xf xy-trajectories.tgz

This will expand the NGSIM (Next Generation Simulation) data set compressed archive, consisting of all cars trajectories for the 4 available maps (now 1.6 GB). Its content is the following:

xy-trajectories
├── i80
│   ├── trajectories-0400-0415.txt
│   ├── trajectories-0500-0515.txt
│   ├── trajectories-0515-0530.txt
│   └── trajectory-data-dictionary.htm
├── lanker
│   ├── trajectories-0830am-0845am.txt
│   ├── trajectories-0845am-0900am.txt
│   └── trajectory-data-dictionary.htm
├── peach
│   ├── trajectories-0400pm-0415pm.txt
│   ├── trajectories-1245pm-0100pm.txt
│   └── trajectory-data-dictionary.htm
└── us101
    ├── trajectories-0750am-0805am.txt
    ├── trajectories-0805am-0820am.txt
    ├── trajectories-0820am-0835am.txt
    └── trajectory-data-dictionary.htm

4 directories, 14 files

Finally, move the xy-trajectories directory inside a folder named traffic-data.

Setting up the environment

In this section we will fetch the repo, install the dependencies, and view the data we just downloaded, so that we can see if everything runs fine. So, open up your terminal, and type:

git clone [email protected]:Atcold/pytorch-PPUU.git
# or with the https protocol
# git clone https://github.com/Atcold/pytorch-PPUU

Now move (or symlink) the traffic-data folder inside the repo:

cd pytorch-PPUU
mv <traffic-data_folder_path> .
# or
# ln -s <traffic-data_folder_path>

Now install the PPUU environment (this expects you have conda on your system, go here if this is not the case):

conda env create -f environment.yaml
#
# To activate this environment, use:
# > source activate PPUU
#
# To deactivate an active environment, use:
# > source deactivate
#

As prescribed, activate it by typing:

source activate PPUU  # or
conda activate PPUU

Finally, have a look at the four maps available in the NGSIM data set, namely: I-80, US-101, Lankershim, and Peachtree. There is a "bonus" map, called AI, where I've hard coded a policy for the vehicles, which are using a PID controller. Type the following command:

python play_maps.py -map <map>
# where <map> can be one of {i80,us101,peach,lanker,ai}
# add -h to see the full list of options available

The frame rate should be greater than 20 Hz. Often it will be larger than 60 Hz. To be noted, here the vehicles are performing the actions extracted from the trajectories, and not simply following the original spatial coordinates.

Dumping the "state, action, cost" triple

In order to train both the world and agent models, we need to create the observations, starting from the NGSIM trajectories and the simulator. This can be done with the following command:

for t in 0 1 2; do python generate_trajectories.py -map i80 -time_slot $t; done
# to dump the triple for the i80 map, otherwise replace i80 with the map you want

Upon the script termination, we will find a folder named state-action-cost within our traffic-data. The content of the latter is now the following:

traffic-data/
├── state-action-cost
│   └── data_i80_v0
│       ├── trajectories-0400-0415
│       │   ├── car1.pkl
│       │   └── ...
│       ├── trajectories-0500-0515
│       │   └── ...
│       └── trajectories-0515-0530
│           └── ...
└── xy-trajectories
    └── ...

Additional info

Each pickled vehicle observation is stored as car{idx}.pkl. Its content is a dict which includes the items and corresponding sizes (shapes):

images               (309, 3, 117, 24)
actions              (309, 2)
lane_cost            (309,)
pixel_proximity_cost (309,)
states               (309, 7, 4)
frames               (309,)

For example, this vehicle was alive for 309 frames (time steps). The images represent the occupancy grid, which is as large as 4 lanes width (24 pixels, here).

  • The R channel represents the lane markings.
  • The G channel encodes the position and shape of the neighbouring vehicles.
  • The B channel depits our own vehicle.

The actions is a collection of 2D vectors, encoding the positive and negative acceleration in both x and y directions. The lane_cost and pixel_proximity_cost are the task specific costs (see slides for details). The states encode position and velocity of the current vehicle and the most closest 6 ones: left/current/right lanes, front/back. Finally, frames tells us the snapshot time stamp, so that we can go back to the simulator, and inspect strange situations present in the observations.

Finally (this will likely be automated soon, and made avaiable for every map), extract the car sizes for the I-80 map with:

python extract_car_size.py

Training the world model

As we have stated above, we need to start by learning how the real world evolve. To do so, we train a neural net, which tries to predict what happens next, given that we start in a given state, and a specific action is performed. More precisely, we are going to train an action conditional variational predictive net, which resembles much a variational autoencoder (VAE) that has three inputs (concatenated sequence of states, images, action) and its output is set to be the next item in the sequence (states, images).

In the code, the world model is shortened as fm, which stands for forward dynamics model. So, let's train the forward dynamics model (fm) on the observational dataset. This can be done by running:

python train_fm.py -model_dir <fm_save_path>

Training the cost model

Along with the dynamics model, we have a separate model to predict the costs of state and action pairs, which can be trained by running:

 python train_cost.py

Training the agent

agent training

uncertainty computation

Once the dynamics model is trained, it can be used to train the policy network, using MPUR, MPER, or IL. These corresponds to:

  • MPUR: Model-based Policy learning with Uncertainty Regularisation (shown in the figure above)
  • MPER: Model-based Policy learning with Expert Regularisation (model-based IL)
  • IL: Imitation Learning (copying the expert actions given the past observations)

This is done by running:

python train_{MPUR,MPER,IL}.py -model_dir <fm_load_path> -mfile <fm_filename>

Evaluating the agent

To evaluate a trained policy, run the script eval_policy.py in one of the three following modes. Type -h to see other options and details.

python eval_policy.py -model_dir <load_path> -policy_model <policy_filename> -method policy-{MPUR,MPER,IL}

You can also specify -method bprop to perform "brute force" planning, which will be computationally expensive.

Parallel evaluation

Evaluation happens in parallel. By default, evaluator script uses min(10, #cores_available) processes. It doesn't go above 10 because then it hits GPU memory limits. To change the number of processes, you can pass -num-processes argument to eval_policy.py script. Also, for this to work, you need to request cpu cores using --cpus-per-task=X argument for slurm. The slurm limits cpu usage to 64 cores per user, and gpus to 18 per user, therefore 3 is a reasonable limit to enable us to use all the gpus without hitting the gpu limit when running multiple evaluations. The CPU limit can be extended, but you need to email the IT helpdesk.

Pre-trained models

Here you can download the predictive model and the policy we've trained on our servers (they are bundled together in the model field of this Python dictionary). The agent achieves 82.0% of success rate.
Here, instead, you can download only the predictive models (one for the state and one for the cost), and try to train the policy by your own.

pytorch-ppuu's People

Contributors

atcold avatar blazejosinski avatar jayabrata97 avatar jiachenzhu avatar justinmae avatar mbhenaff avatar mikaelhenaff avatar nyufb avatar skarakulak avatar vladisai avatar yair-schiff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-ppuu's Issues

New failure visualisation tool

In order to be able to address each and every type of failure, we need to implement a visualisation tool, which allows us to shed some light on WTF is going on.

board

same rendering error as before ;(

  File "/home/mbhenaff/projects/pytorch-Traffic-Simulator/traffic_gym.py", line 655, in render
    v.store('state_image', (max_extension, screen_surface, width_height, scale))
  File "/home/mbhenaff/projects/pytorch-Traffic-Simulator/traffic_gym.py", line 390, in store
    self._states_image.append(self._get_observation_image(*object_))
  File "/home/mbhenaff/projects/pytorch-Traffic-Simulator/traffic_gym.py", line 377, in _get_observation_image
    sub_rot_surface = rot_surface.subsurface(x, y, *width_height)
ValueError: subsurface rectangle outside surface area
>>> t
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 't' is not defined
>>> env.file_name
'./data_i80/trajectories-0500-0515.txt'

^^ says the file name above

Predicted costs is list object in planning.py

I receive the following error when running train_MPUR.py and train_cost.py:

planning.py", line 86, in compute_uncertainty_batch
       pred_costs  = pred_costs. view(n_models, bsize, npred, -1)
       AttributeError: 'list' object has no attribute 'view'

Is this because I am using a regular fwd-cnn model? Do I need to use fwd-cnn-vae-fp in order to ensure that pred_costs is generated correctly?

running error of play_maps.py

python play_maps.py -map i80
Traceback (most recent call last):
  File "play_maps.py", line 81, in <module>
    observation, reward, done, info = env.step(action)
  File "/anaconda3/envs/PPUU/lib/python3.8/site-packages/gym/wrappers/order_enforcing.py", line 37, in step
    return self.env.step(action)
  File "/anaconda3/envs/PPUU/lib/python3.8/site-packages/gym/wrappers/step_api_compatibility.py", line 52, in step
    step_returns = self.env.step(action)
  File "/anaconda3/envs/PPUU/lib/python3.8/site-packages/gym/wrappers/env_checker.py", line 37, in step
    return env_step_passive_checker(self.env, action)
  File "/anaconda3/envs/PPUU/lib/python3.8/site-packages/gym/utils/passive_env_checker.py", line 269, in env_step_passive_checker
    assert isinstance(
AssertionError: The `info` returned by `step()` must be a python dictionary, actual type: <class 'NoneType'>

error when generating data

This error comes up from time to time when generating the data (after around 100-150 episodes generated):

File "/home/mbhenaff/projects/pytorch-Traffic-Simulator/traffic_gym.py", line 401, in step
current_lane_idx = lane_set.pop()
KeyError: 'pop from an empty set'

eval_policy - episodes id's are different than the past results

It seems that the episode ID's that eval_policy.py currently uses are not the same as the ones that we have in the past results. That is after adjusting the change in directory structure that was made in the commit 6860104. Before that, the episodes were being saved under the folder ep{j} (which is the convention used in the past simulation results that we have for the failure cases). After this commit the simulations are saved under the folder ep{j+1}. So I am making this adjustment when I compare my current results with the past results.

I have confirmed that my data_splits.pth file is the same as the on in the traffic-data-atcold/data_i80_v0 folder.

For visual comparison I have uploaded the simulation videos to the drive folder below
https://drive.google.com/drive/u/1/folders/15jH0rCAMGELvoo6FBUu_Tn0kNUo3RTXs
The files that have the suffix 'policy_82' are the episodes that were previously saved. When I check the results of the same episodes using the latest code, I am getting the episodes that I have saved with the suffix 'policy_87'.

Understanding the Algorithm about the nearby cars

In building the forward model, as in your paper said, the surrounding cars will not be modeled as human-like behaves, but just follow their trajectories in the dataset. Then why in the simulated scenarios from the forward model from the same initial state, the surrounding cars' trajectories are different in different scenarios, should they be all the same since we just follow their trajectories in the dataset?

Generate data slurm scripts need named argument

Since run_generate_data.sh was changed to use named arguments, submit_generate_data_<map>.slurm needs to be changed accordingly:

This line should be changed from:
srun python -u generate_trajectories.py -time_slot $1 -map i80
to:
srun python -u generate_trajectories.py -time_slot $time_slot -map i80

Same change should be applied to .slurm for each map

Imitation Learner (state list input)

Train model to predict sequence of actions from state inputs. Inputs are a list of neighboring car states, i.e. (x, y, dx, dy). Idea is to learn a permutation-invariant model which can deal with variable list sizes and different list order. Parameters are shared across different cars and influence of other cars is represented as a sum, both of these design choices enable the permutation invariance.

Train same model types as for image inputs.

How are the maps created?

Hi
I would like to know how are the maps created for Lankershim? I see a resource map file in the corresponding folder but not sure how it is obtained. What is its origin and source?

Sampling rate of data used

Hi PPUU-Team,

I'd like to estimate the planning horizon in seconds of your algorithm.

I'm wondering, what sample rate did you used for generating your data (images, states, actions, costs, ...)?
With regard to the "Global Time" entry in the file "./traffic-data/xy-trajectories/i80/trajectories-0500-0515.txt" the sampling rate should be 10 fps. Is this correct?

Thanks,
Timo

Issue in dumping "state, action, cost" triple

I was trying run generate_trajectories.py. However, it was giving error TypeError: step() missing 1 required positional argument: 'action' on line 90 observation, reward, done, info = env.step(). I changed it to observation, reward, done, info = env.step(np.zeros((2,))). Then it started building pkl files. @Atcold , is it correct?

PyTorch v1.1.0 incompatibility

  • With the latest update, PyTorch v1.1.0 now requires an additional argument (no longer optional) for some convolutional layer.
  • When an older model is loaded, the code breaks.
    Simply running the code (and generating a new model) is expected to break as well.

Solution for both (but needs understanding / investigation) should be adding the missing argument in models.py. The environment requirements environment.yaml needs to be updated (lift restriction on former PyTorch v1.0.1).

get_batch_fm gets additional index

In get_batch_fm method in dataloader.py, self.opt.ncond + npred + 1 number of images, states, and actions are loaded:

images.append(self.images[s][t:t+(self.opt.ncond+npred)+1].cuda())
actions.append(self.actions[s][t:t+(self.opt.ncond+npred)].cuda())
states.append(self.states[s][t:t+(self.opt.ncond+npred)+1].cuda())
costs.append(self.costs[s][t:t+(self.opt.ncond+npred)+1].cuda())

Note that only self.opt.ncond + npred number of actions are loaded.
It is not clear why there is a +1 for the other variables.

Additionally, when using US101 dataset, there was an error because number of self.states[s].size(0) exceeded that self.images[s].size(0), therefore need to make the following adjustment to line 154 of code:
T = self.states[s].size(0)

T = min(self.images[s].size(0), self.states[s].size(0))

train_fm.py: IndexError: list index out of range

python train_fm.py -model_dir ~/PPUU-models --model fwd-cnn-vae-fp

[loading data shard: traffic-data/state-action-cost/data_i80_v0/trajectories-0400-0415/all_data.pth]
[loading data shard: traffic-data/state-action-cost/data_i80_v0/trajectories-0500-0515/all_data.pth]
[loading data shard: traffic-data/state-action-cost/data_i80_v0/trajectories-0515-0530/all_data.pth]
Number of episodes: 5596
[loading data splits: traffic-data/state-action-cost/data_i80_v0/splits.pth]
[loading data stats: traffic-data/state-action-cost/data_i80_v0/data_stats.pth]
[loading car sizes: traffic-data/state-action-cost/data_i80_v0/car_sizes.pth]
[will save model as: ~/PPUU-models/model=fwd-cnn-vae-fp-layers=3-bsize=8-ncond=20-npred=20-lrt=0.0001-nfeature=256-dropout=0.0-nz=32-beta=0.0-zdropout=0.0-gclip=5.0-warmstart=0-seed=1]
[training]
Traceback (most recent call last):
  File "train_fm.py", line 203, in <module>
    train_losses = train(opt.epoch_size, opt.npred)
  File "train_fm.py", line 148, in train
    inputs, actions, targets, _, _ = dataloader.get_batch_fm('train', npred)
  File "~/pytorch-PPUU/dataloader.py", line 175, in get_batch_fm
    car_id = int(re.findall('car(\d+).pkl', splits[5])[0])
IndexError: list index out of range

Make evaluation parallel

Right now, the evaluation happens sequentially, but should be parallel.
Need to implement similarly to the link below:
run.py

train_fm.py: FwdCNN' object has no attribute 'intype'

python train_fm.py -model_dir ~/PPUU-models

[loading data shard: traffic-data/state-action-cost/data_i80_v0/trajectories-0400-0415/all_data.pth]
[loading data shard: traffic-data/state-action-cost/data_i80_v0/trajectories-0500-0515/all_data.pth]
[loading data shard: traffic-data/state-action-cost/data_i80_v0/trajectories-0515-0530/all_data.pth]
Number of episodes: 5596
[loading data splits: traffic-data/state-action-cost/data_i80_v0/splits.pth]
[loading data stats: traffic-data/state-action-cost/data_i80_v0/data_stats.pth]
[loading car sizes: traffic-data/state-action-cost/data_i80_v0/car_sizes.pth]

[will save model as: ~/PPUU-models/model=fwd-cnn-layers=3-bsize=8-ncond=20-npred=20-lrt=0.0001-nfeature=256-dropout=0.0-gclip=5.0-warmstart=0-seed=1]
Traceback (most recent call last):
  File "train_fm.py", line 107, in <module>
    model.intype('gpu')
  File "~/miniconda3/envs/PPUU/lib/python3.7/site-packages/torch/nn/modules/module.py", line 535, in __getattr__
    type(self).__name__, name))
AttributeError: 'FwdCNN' object has no attribute 'intype'```

Scipy misc imresize absent

From pilutil.py we have that

@numpy.deprecate(message="`imresize` is deprecated in SciPy 1.0.0, "
                         "and will be removed in 1.3.0.\n"
                         "Use Pillow instead: ``numpy.array(Image.fromarray(arr).resize())``.")
def imresize(arr, size, interp='bilinear', mode=None):
    """
    Resize an image.

    This function is only available if Python Imaging Library (PIL) is installed.

    .. warning::

        This function uses `bytescale` under the hood to rescale images to use
        the full (0, 255) range if ``mode`` is one of ``None, 'L', 'P', 'l'``.
        It will also cast data for 2-D images to ``uint32`` for ``mode=None``
        (which is the default).

    Parameters
    ----------
    arr : ndarray
        The array of image to be resized.
    size : int, float or tuple
        * int   - Percentage of current size.
        * float - Fraction of current size.
        * tuple - Size of the output image (height, width).

    interp : str, optional
        Interpolation to use for re-sizing ('nearest', 'lanczos', 'bilinear',
        'bicubic' or 'cubic').
    mode : str, optional
        The PIL image mode ('P', 'L', etc.) to convert `arr` before resizing.
        If ``mode=None`` (the default), 2-D images will be treated like
        ``mode='L'``, i.e. casting to long integer.  For 3-D and 4-D arrays,
        `mode` will be set to ``'RGB'`` and ``'RGBA'`` respectively.

    Returns
    -------
    imresize : ndarray
        The resized array of image.

    See Also
    --------
    toimage : Implicitly used to convert `arr` according to `mode`.
    scipy.ndimage.zoom : More generic implementation that does not use PIL.

    """
    im = toimage(arr, mode=mode)
    ts = type(size)
    if issubdtype(ts, numpy.signedinteger):
        percent = size / 100.0
        size = tuple((array(im.size)*percent).astype(int))
    elif issubdtype(type(size), numpy.floating):
        size = tuple((array(im.size)*size).astype(int))
    else:
        size = (size[1], size[0])
    func = {'nearest': 0, 'lanczos': 1, 'bilinear': 2, 'bicubic': 3, 'cubic': 3}
    imnew = im.resize(size, resample=func[interp])
    return fromimage(imnew)

state image error when running with real data (comes after a while)

self._states_image.append(self._get_observation_image(*object_))

File "/home/mbhenaff/projects/pytorch-Traffic-Simulator/traffic_gym.py", line 367, in _get_observation_image
sub_rot_surface = rot_surface.subsurface(x, y, *width_height)
ValueError: subsurface rectangle outside surface area

car_size.pth not found

I get an error indicating that "car_size.pth" not found.

python train_fm.py -model_dir <fm_save_path>

Funky copying syntax

/home/atcold/Work/GitHub/pytorch-Traffic-Simulator/utils.py:72: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).

Imitation Learner (image inputs)

Train model to predict sequence of actions from image inputs. Images are car-centric and have 2 channels, lane markings and neighboring cars.

Several different modes:

  • Deterministic: a = pi(s)
  • Latent (EEN): a = pi(s, z)
  • Latent (VAE): a = pi(s, z)

The different latent models differ in how they encode and sample the latent variables z.

Having latent variables should account for uncertainty, i.e. pass to the left/right, brake, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.