Code Monkey home page Code Monkey logo

policy-learning-landscape's Introduction

Policy Learning Landscape

This repository contains code to explore the policy optimiaztion landscape.

Quick setup

To run cartpole simply do:

python3 run_eager_policy_optimization.py --env CartPole-v0 --policy_type discrete

To run something from Mujoco you must have it installed and the associated license. To run Hopper-v1 use:

python3 run_eager_policy_optimization.py --env Hopper-v1 --policy_type normal --std 0.5

Parameters will be saved into ./parameters as numpy files. After obtaining some parameters from different runs use the following commands to analyze the landscape.

  1. First install eager_pg: pip install -e ..

  2. Random Pertubations Experiment:

cd interpolation_experiments
python paired_random_directions_experiment.py --p1 ./path/to/parameter/1/npy \
--save_dir ./path/to/save/in/ \
--alpha 0.5 --std 0.5 --n_directions 500
  1. Linear Interpolation Experiment:
cd interpolation_experiments
python simple_1d_interpolation_experiment.py --p1 ./path/to/parameter/1/npy \
--p2 ./path/to/parameter/2/npy --save_dir ./path/to/save/in/ \
--stds 5.0 --alpha_start -0.5 --alpha_end 1.5 --n_alphas 2 \
--save_dir ./path/to/save/in

Note that interpolation tools only work with continuous policies.

Code organization

  • eager_pg: contains a small library to enable quick research in policy gradient reinforcement learning.
  • analysis_tools: contains tooling to make nice figures in papers.
  • interpolation_experiments: Experiments to explore the landscape in policy optimization.

Citation

If you use the proposed method or code, we'd appreciate if you could cite this work!

@article{ahmed2018understanding,
  title={Understanding the impact of entropy in policy learning},
  author={Ahmed, Zafarali and Roux, Nicolas Le and Norouzi, Mohammad and Schuurmans, Dale},
  journal={arXiv preprint arXiv:1811.11214},
  year={2018}
}

Disclaimer

This is not an official Google product.

policy-learning-landscape's People

Contributors

zafarali avatar

Stargazers

Sandip Giri avatar  avatar  avatar thisray avatar seven8827 avatar  avatar Terence Liu avatar eaves avatar Gábor Mihálcz avatar Junkai Ren avatar Mohamed Aboshosha avatar Daniel Mantei avatar Typical Engineer avatar Wenhao Gao avatar Alice-cff avatar  avatar Xuechen Li avatar STYLIANOS IORDANIS avatar Zoe Huiling Zhen avatar Jon Repp avatar Chan avatar Mihir Kawatra avatar  avatar Thiago P. Bueno avatar havenoname avatar Daniel Salvadori avatar Andrew avatar  avatar Ján Drgoňa avatar Bhairav Mehta avatar ben avatar Anirudh Suresh avatar D.C. Wang avatar Shigeki Karita avatar  avatar 爱可可-爱生活 avatar Kun Shao avatar Weixun Wang avatar Julia Kreutzer avatar  avatar Kashif Rasul avatar Ryan Busby avatar L David avatar Hayes Wong avatar Lucas Caccia avatar Jose Cohenca avatar Abhishek Patnaik avatar chymgalois avatar wurentidai avatar

Watchers

David Bieber avatar Jongwook Choi avatar Andrew Chen avatar  avatar Thomas Wolf avatar d3sm0 avatar Sam Lerman avatar  avatar Ján Drgoňa avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.