Code Monkey home page Code Monkey logo

graph-rl's Introduction

Graph RL

Prerequisites

numpy, matplotlib, gensim, pygsp

Code structure

lspi

The code in this folder was built on top of this LSPI python package.

Least Squares Policy Iteration (LSPI) implementation.

Implements the algorithms described in the paper

"Least-Squares Policy Iteration.โ€ Lagoudakis, Michail G., and Ronald Parr. Journal of Machine Learning Research 4, 2003.

The implementation is based on the Matlab implementation provided by the authors.

basis_funciton.py

This file implements the abstract class BasisFunction. A basis function is a function that takes in a state vector and an action index and returns a vector of features.

A few specific types of basis functions are further implemented in this file:

  • Fake Basis Funtion: it simply ignores all inputs an returns a constant basis vector. Can be useful for random sampling.
  • One Dimensional Polynomial Basis Funcion: simple polynomial features for a state with one dimension.
  • Radial Basis Function: the Gaussian multidimensional radial basis function (RBF).
  • Proto-Value Basis Function: the PVFs as described in Mahadevan and Maggioni's work.
  • Node2Vec: automatically learnt basis function using the node2vec algorithm.

domains.py

Contains example domains that LSPI works on. In particular, it implements the grid maze domain in which the state space is a set of nodes on a N1 by N2 grid. Most nodes are always accessible (= rooms with 1. transition probability), some nodes might be inaccessible (= walls with 0. transition probability), and some nodes might be difficult to access (= obstacles with p transition probability 0 < p < 1). There is one absorbing goal state that gives reward of 100; all other states are non absorbing and do not give any reward.

lspi.py

Contains the main interface to LSPI algorithm.

node2vec.py

Implements the node2vec algorithm to learn node embeddings.

policy.py

Contains the LSPI policy class used for learning, executing and sampling policies.

sample.py

Contains the Sample class that respresents an LSPI sample tuple (s, a, r, s', absorb) : (state, action, observed reward, future state, absorbing state)

solvers.py

Implementation of LSTDQ solver with standard matrix solver (the algorithm from Figure 5 of the LSPI paper)

learning_maze.py

This class implements maze environments such as the one depicted below.

alt text

In such environment, the green states are accessible rooms, the dark purple states are strict walls and the yellow state is the goal state. An agent can be initially placed in any accessible state and it aims at reaching the goal state.

The class implements methods for learning the PVF basis functions as well as polynomial and node2vec basis functions.

optimise.py

This code implements functions for approximating the true value function using least-square optimisation from the basis functions (that is, by minimising the mean squared error). Note that this is only possible in small environment since the true value function computed using value iteration algorithm.

For the environment described in the previous subsection, the goal is to approximate the following value functions as closely as possible.

alt text

PVF_simulation.py

Code for running simulations. It can be called directly from the terminal as follows
>> python PVF_simulations.py

Running this code will run LSPI algorithm on simple maze envrionments (no walls, no obstacles) of different size and with different LSPI hyperparameters. All the parameters are defined as constants at the beginning of the file and can be changed accordingly. After It will save plots (averages over 20 runs of the number of steps to reach the goal and the cumulative reward) in a folder plots/ (make sure such a folder exists in the GraphRL/ folder).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.