GPPEVI

Gaussian Process for Offline Reinforcement Learning with Insufficient Data Coverage

In Gridworld:

+ BCQ.py, BEAR.py, train_pql.py are used for training PQL policy 
+ collect_data.py is used for collecting data 
+ env.py stores the configuration of the environment including of: 
    - HORIZON
    - BOARD_SIZE
    - ACTIONS Set
    - Transition noise 
    - Feature map phi 
    - Other environment configurations
+ gen_weight.py: generate random model parameters for Linear PEVI 
+ train_gp.py is used for training GP PEVI policy 
+ train_fqi.py is used for training FQI policy 
+ utils.py stores other configuration for the problem including of: 
    - Dataset class
    - Number of trajectories in dataset 
+ Environment configurations:
    - Current BOARD_SIZE: 20
    - Current HORIZON: 40 
    - Transition noise: 0.05 
    - Feature map: phi(state, action) = np.array([1/(x**2+1), 1/(y**2+1), 1/(action_index+1)])
    - Reward: 
        0 if at the half right-bottom corner 
        1 if at (1, 1)
        1/4 otherwise 
+ Then, the optimal strategy would be trying to reach the upper-left corner 

+ run python3 collect_data.py to collect data with random policy  
+ run python3 train_fqi.py to train fqi policy, similarly for train_pql.py, train_gp.py 
+ To configure location to save offline policy, go into each of the file for training offline policies

In Pendulum:

+ BCQ.py, BEAR.py, train_pql.py are used for training PQL policy 
+ collect_data.py is used for collecting data 
+ utils.py stores the configuration of the environment as well as the problem including of: 
    - HORIZON
    - BOARD_SIZE
    - ACTIONS Set
    - Transition noise 
    - Feature map phi 
    - Other environment configurations
    - Dataset class
    - Number of trajectories in dataset 
+ Forgot to setup Linear PEVI :))) 
+ train_gp.py is used for training GP PEVI policy 
+ train_fqi.py is used for training FQI policy 
+ Environment configurations:
    - Discretize state space into 10 states each dimension
    - Discretize action space into 10 actions each dimension 
    - State has 3 dimensions 
    - Action has 1 dimension 
    - Current HORIZON: 50 

+ run python3 collect_data.py to collect data with random policy  
+ run python3 train_fqi.py to train fqi policy, similarly for train_pql.py, train_gp.py 
+ To configure location to save offline policy, go into each of the file for training offline policies

shady2000 / gppevi Goto Github PK

gppevi's Introduction

GPPEVI

gppevi's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent