Code Monkey home page Code Monkey logo

conrl's Introduction

Constrained episodic reinforcement learning in concave-convex and knapsack settings

This repository implements the algorithms presented in the paper

Dependencies

  • We advise the reader to use virtualenv so that installing dependencies is easy

Installation

python -m pip install -e .

Code Arguments

> python -u run.py --help
usage: run.py [-h] [--map MAP] [--alg {baseline,optimistic,appropo}]
              [--rounds ROUNDS] [--seed SEED]
              [--solver {ucbvi,policy_iteration,reinforce,value_iteration,a2c}]
              [--horizon HORIZON] [--output_dir OUTPUT_DIR]
              [--env {box_gridworld,gridworl}] [--budget BUDGET [BUDGET ...]]
              [--randomness RANDOMNESS] [--value_coef VALUE_COEF]
              [--entropy_coef ENTROPY_COEF]
              [--actor_critic_lr ACTOR_CRITIC_LR]
              [--discount_factor DISCOUNT_FACTOR]
              [--num_episodes NUM_EPISODES]
              [--conplanner_iter CONPLANNER_ITER] [--bonus_coef BONUS_COEF]
              [--planner {value_iteration,a2c}]
              [--optimistic_lambda_lr OPTIMISTIC_LAMBDA_LR]
              [--optomistic_reset {warm-start,scratch,continue}]
              [--num_fic_episodes NUM_FIC_EPISODES] [--mx_size MX_SIZE]
              [--proj_lr PROJ_LR] [--baseline_lambda_lr BASELINE_LAMBDA_LR]

Running the code

To run the experiments, go to the directory rlwithknapsacks/,

for the different environments:

  • Gridworld use the flag --env gridworld
  • Box Gridworld use the flag --env box_gridworld

for different instantions of our algorithm:

  • Value-Iteration planner use the flag --planner value_iteration
  • Actor-Critic planner use the flag --planner a2c

Commands to reproduce results in our paper:

  • Gridworld + Value-Iteration, run python -u run.py --alg optimistic --env gridworld --planner value_iteration
  • Gridworld + Actor-Critic, run python -u run.py --alg optimistic --env gridworld --planner a2c
  • Box Gridworld + Value-Iteration, run python -u run.py --alg optimistic --env box_gridworld --planner value_iteration
  • Box Gridworld + Actor-Critic, run python -u run.py --alg optimistic --env box_gridworld --planner a2c

The results are generated and stored in the location specified by --output_dir folder

Acknowledgement:

We would like to thank Tim Vieira for creating this "repo" that our codebase builds on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.