Code Monkey home page Code Monkey logo

infinite-horizon-off-policy-estimation's Introduction

infinite-horizon-off-policy-estimation

This repository contains an implementation for following paper: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Citation

If you find this article useful please consider citing:

	@article{liu2018breaking,
	  title={Breaking the curse of horizon: Infinite-horizon off-policy estimation},
	  author={Liu, Qiang and Li, Lihong and Tang, Ziyang and Zhou, Dengyong},
	  journal={arXiv preprint arXiv:1810.12429},
	  year={2018}
	}

infinite-horizon-off-policy-estimation's People

Contributors

zt95 avatar

Stargazers

Ritam Majumdar avatar Hao Chen avatar ngu avatar  avatar  avatar  avatar Tatsuya Matsushima avatar Jose Cohenca avatar  avatar Sharath Raparthy avatar havenoname avatar Alexander avatar Yingru Li avatar

Watchers

Yingru Li avatar  avatar Tatsuya Matsushima avatar paper2code - bot avatar

infinite-horizon-off-policy-estimation's Issues

Question about discrete case

Hi!

In the density_ratio_estimate function, you have "x = quadratic_solver(n, G/50., regularizer)". Where does the 50. come from?

generating SARS data

hello,
i am new to SUMO.
when i run the sumo/collect_data.py, i find that

Error: Invalid vehicle id '0_NE_(1,1)'. Contains invalid characters.

and i delete all the files in directory ./data and generate new 'grids.net_0.xml' file using
netconvert -n data/grids.nod.xml -e data/grids.edg.xml -i data/grids.tlLogic.xml -o data/grids.net_0.xml' , it turns out that

Error: value '(0,1)' does not match regular expression facet '[^ \t\n\r|\\;,']+'

i wonder if the code in function 'generate_nodes ' producing the legal 'grids.nod.xml' file.
i want to know how to handle this case to generate SARS data successfully.
best

Negative loss

Hi! I have a question about how loss is defined.

In the paper, the loss takes the form D(w) = L^2 = E[ d(w,s,a,s') d(w1, s1,a1,s1') k(s,s') ]. In other words, it has the form E[x^T K y] for x=d(w,s,a,s') and y = d(w1,s1,a1,s1'). This means that x^T K y is always positive (since E[x^T K y] = L^2 > 0). However, empirically, when running the sumo code, i'm seeing negative values for the loss_xx. I'm very confused by this. Is this a bug or is negative loss allowed?

screen shot 2019-03-08 at 10 45 50 am

^ shows loss_xx and self.loss for a few epochs of training. Notice that the loss is negative in some cases.

Discrete G

I'm trying to get the discrete (discounted) case to work for a toy mdp, but it doesn't seem to be giving sensical results. I think maybe I'm doing something wrong.

Could you explain what G, Nstate, Ghat are and how they relate to \Delta and k(s,s') in the paper?

Confusion about Algorithm 1 output

Hi!

Slightly confused about the output of algorithm 1. It says that the off-policy estimate of Pi1 is given by v^T r / sum(v). Consider the case where the reward r is identically -1 for any transition (s,a,r,s'). Then the OPE estimate evaluates to -sum(v)/sum(v) = -1. Which is the correct average reward.

However, if we're considering a (long) variable-length finite horizon with gamma=1, then this algorithm will clearly not work. Do you suggest to use the second algorithm instead with a gamma very very close to 1?

Thanks!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.