Code Monkey home page Code Monkey logo

2022neurips_smarts_competition_code's Introduction

2022NeurIPS_SMARTS_competition_code

1. Summary

The core idea of our approach is to train an online multi-task RL agent to solve the driving scenarios. In the following, we provide the detailed MDP formation, the RL algorithm design and the overall model architecture.

2. MDP formulation

2.1 Observation Space

The observations contain three parts of information: (1) the ego vehicle information, (2) lane information and (3) neighboring vehicles' information. All information is transformed into the ego vehicle's view. The final concatenated observation is a 100-dimensional vector.

In detail, the lane information contains:

  • Relative positions and headings of 5 waypoints ahead.
  • Speed limit.
  • Lane width.
  • Target lane indicator.
  • Goal lane indicator.
  • The indicator of whether the ego vehicle can change lanes.
  • Relative lane index.

The neighboring vehicles' information contains:

  • Relative positions, speeds and headings of 5 nearest neighbor vehicles.

2.2 Action Space

In our design, we define 11 high-level decisions with discrete action space:

  • Change lanes left, speed = 0.4/0.7/1.0*speed_limit
  • Change lanes right, speed = 0.4/0.7/1.0*speed_limit
  • Keep lane, speed = 0.4/0.7/1.0*speed_limit
  • Brake
  • Reverse, speed = 0.2*speed_limit

2.3 Reward

  • distance travelled reward: The amount of distance that the ego vehicle has travelled along its mission route.
  • events reward: +100 for *reached_goal*, -50 for *collisions*, -30 for *off_route/wrong_way/on_shoulder/off_road*
  • lane changing reward: To discourage useless lane changing, we give a reward of -3 for each lane changing action.
  • goal reward: If the distance from the end position of the ego vehicle to the goal position is smaller than 20m, we give a reward of min(5/distance to goal, 10).

3. RL Algorithm

We choose Discrete Soft Actor Critic as our backbone RL algorithm. In addition, we incorporate some carefully designed components to improve the performance of our model.

3.1 Multi-task Model Architecture

We consider the 8 training scenarios as 8 basic skills and assume that each testing scenario can be solved by composing these learned basic skills. Overall, we train a multi-task RL agent and a scenario classifier deciding which policy to use at each decision time. The overall model architecture is shown in the Figure below.

softmodule_block

For the multi-task RL training, we incorporate the SoftModule network as the backbone architecture of both the policy network and the Q network of the SAC algorithm. The scenario classifier is an MLP which takes the observation as input and outputs a one-hot encoding indicating which skill module to reuse.

3.2 State Encoder Design

Since the observation consists of the 5 neighboring vehicles' information and 3 lanes' information, we use two separate networks to process these two types of information and then merge them at the output layer. The architecture is shown below.

obs_encoder

3.3 Collision Prediction and Lane Masking

In our experiment, we found that it is difficult to train the agent to actively avoid collisions and change lanes to the goal lane in an end-to-end way. So we incorporate an additional collision prediction module into the architecture. Besides, we represent the goal lane index as an additional mask, and feed it into the model. Both the two types of information are incorporated into the model by multiplying them by the output of the SoftModule in an element-wise fashion. The complete network architecture is shown in the figure below.

taskspec_sm

The collision probability is trained via a separate SARSA RL algorithm. We set the reward to be 1 when collision occurs and 0 otherwise. The Q value $Q_\pi(s,a)$ of SARSA indicates the collision probability of taking action $a$ at state $s$.

3.4 Rule-based Policy Enhancing

Reasonable and effective rule constraints can not only help us simplify the learning task, but also accelerate the training speed and improve the model performance. We carefully design a set of rules to help change lanes safely and brake actively to avoid collisions by judging and modifying the policy output of our RL agent.

The overall training and testing procedures are described in the figure below.

model-3

2022neurips_smarts_competition_code's People

Contributors

zibindong avatar

Stargazers

Frzgunr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.