Code Monkey home page Code Monkey logo

gppevi's Introduction

GPPEVI

Gaussian Process for Offline Reinforcement Learning with Insufficient Data Coverage

In Gridworld:

+ BCQ.py, BEAR.py, train_pql.py are used for training PQL policy 
+ collect_data.py is used for collecting data 
+ env.py stores the configuration of the environment including of: 
    - HORIZON
    - BOARD_SIZE
    - ACTIONS Set
    - Transition noise 
    - Feature map phi 
    - Other environment configurations
+ gen_weight.py: generate random model parameters for Linear PEVI 
+ train_gp.py is used for training GP PEVI policy 
+ train_fqi.py is used for training FQI policy 
+ utils.py stores other configuration for the problem including of: 
    - Dataset class
    - Number of trajectories in dataset 
+ Environment configurations:
    - Current BOARD_SIZE: 20
    - Current HORIZON: 40 
    - Transition noise: 0.05 
    - Feature map: phi(state, action) = np.array([1/(x**2+1), 1/(y**2+1), 1/(action_index+1)])
    - Reward: 
        0 if at the half right-bottom corner 
        1 if at (1, 1)
        1/4 otherwise 
+ Then, the optimal strategy would be trying to reach the upper-left corner 

+ run python3 collect_data.py to collect data with random policy  
+ run python3 train_fqi.py to train fqi policy, similarly for train_pql.py, train_gp.py 
+ To configure location to save offline policy, go into each of the file for training offline policies

In Pendulum:

+ BCQ.py, BEAR.py, train_pql.py are used for training PQL policy 
+ collect_data.py is used for collecting data 
+ utils.py stores the configuration of the environment as well as the problem including of: 
    - HORIZON
    - BOARD_SIZE
    - ACTIONS Set
    - Transition noise 
    - Feature map phi 
    - Other environment configurations
    - Dataset class
    - Number of trajectories in dataset 
+ Forgot to setup Linear PEVI :))) 
+ train_gp.py is used for training GP PEVI policy 
+ train_fqi.py is used for training FQI policy 
+ Environment configurations:
    - Discretize state space into 10 states each dimension
    - Discretize action space into 10 actions each dimension 
    - State has 3 dimensions 
    - Action has 1 dimension 
    - Current HORIZON: 50 

+ run python3 collect_data.py to collect data with random policy  
+ run python3 train_fqi.py to train fqi policy, similarly for train_pql.py, train_gp.py 
+ To configure location to save offline policy, go into each of the file for training offline policies

gppevi's People

Contributors

shady2000 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.