Code Monkey home page Code Monkey logo

gym-pyro's Introduction

gym-pyro

OpenAI Gym environments for MDPs, POMDPs, and confounded-MDPs implemented as pyro-ppl probabilistic programs.

Installation

This package is dependent on the rl_parsers package. Install rl_parsers first, then install the packaged in requirements.txt.

Contents

This repository provides the PyroMDP, PyroPOMDP, and PyroCMDP environments, whose dynamics are respectively loaded from the .mdp, .pomdp and .cmdp file formats.

See Cassandra's POMDP page for the specifications of the POMDP file format. The MDP and CMDP formats follow suit. Also see the examples/ folder for sample files in the .mdp, .pomdp and .cmdp file format.

Useful Attributes and Methods

PyroMDP

Attributes:

env.states : tuple of state names

env.actions : tuple of action names

Methods:

env.reset(keep_state=False) : Resets the time-index to 0 and the system state by sampling site S_0 from the starting distribution. If keep_state is True, then the time-index is reset, but the current system state is kept and resampled as site S_0; This is useful to run simulations from the current environment context.

env.step(action) : Performs the action, advances the time-index and the system state. State, reward and done variables are respectively sampled as sites S_{t}, R_{t}, and D_{t}, where t is the current time-index.

env.render(mode='human') : Renders the previous action and reward, and the current system state. Accepts modes human and ansi.

PyroPOMDP

Attributes:

env.states : tuple of state names

env.actions : tuple of action names

env.observations : tuple of observation names

Methods:

env.reset(keep_state=False) : Resets the time-index to 0 and the system state by sampling site S_0 from the starting distribution. If keep_state is True, then the time-index is reset, but the current system state is kept and resampled as site S_0; This is useful to run simulations from the current environment context.

env.step(action) : Performs the action, advances the time-index and the system state. State, observation, reward and done variables are respectively sampled as sites S_{t}, O_{t}, R_{t}, and D_{t}, where t is the current time-index.

env.render(mode='human') : Renders the previous action and reward, and the current system state. Accepts modes human and ansi.

PyroCMDP

Attributes:

env.confounders : tuple of confounder names

env.states : tuple of state names

env.actions : tuple of action names

Methods:

env.reset(keep_state=False) : Resets the time-index to 0, the confounder by sampling site U, and the system state by sampling site S_0 from the starting distribution. If keep_state is True, then the time-index is reset, but the confounder and the current system state are kept and resampled as sites U and S_0; This is useful to run simulations from the current environment context.

env.step(action) : Performs the action, advances the time-index and the system state. State, reward and done variables are respectively sampled as sites S_{t}, R_{t}, and D_{t}, where t is the current time-index.

env.render(mode='human') : Renders the previous action and reward, and the current system state. Accepts modes human and ansi.

gym-pyro's People

Contributors

abaisero avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.