Code Monkey home page Code Monkey logo

Comments (6)

zsunberg avatar zsunberg commented on July 22, 2024

I'm not familiar with this software, but we have often come across this issue in developing POMDPs.jl. In that software, the reward is defined in terms of the current state, action, and next state (s, a, and s'), but some solvers require a reward function that's only dependent on s and a, just like you describe.

I think that if you just define the s-a reward function as the expectation of the s-a-s' reward function, that is

image
most algorithms should find a solution that is equivalent to the case where you defined R(s,a,s').

from ai-toolbox.

Svalorzen avatar Svalorzen commented on July 22, 2024

You can very easily define your own reward matrices SxAxS', such that for each pair S-S' the reward is the same independently of the action (and put it in the model using this function).

The code for MDP::Model internally simplifies such a matrix to an equivalent SxA matrix, using the equivalence @zsunberg mentioned. This is done mainly to reduce the amount of computation to be done later on, since it is a lossless operation. SxAxS' can't however be reduced to an SxS' in general without losing information.

Note that this transformation is not required by any algorithm. If the SxA matrix is not available (as in, there is no getRewardFunction() which returns a 2D Eigen matrix in your particular model class), all algorithms will simply iterate over all SxAxS' combinations using the getExpectedReward(size_t, size_t, size_t) function (see for example this function).

This is what the code does for example when using this older MDP model class which I keep to test such functionality. In any case remember that you are always free to implement your own classes and use them freely as the algorithms are templated - as long as they satisfy the basic interfaces everything will work out.

from ai-toolbox.

hifzajaved avatar hifzajaved commented on July 22, 2024

I'm not familiar with this software, but we have often come across this issue in developing POMDPs.jl. In that software, the reward is defined in terms of the current state, action, and next state (s, a, and s'), but some solvers require a reward function that's only dependent on s and a, just like you describe.

I think that if you just define the s-a reward function as the expectation of the s-a-s' reward function, that is

image
most algorithms should find a solution that is equivalent to the case where you defined R(s,a,s').

Thanks for your response.
Can you tell me which solvers in your library (offline and online) accept the s-a-s' reward function?

from ai-toolbox.

zsunberg avatar zsunberg commented on July 22, 2024

Hi @hifzajaved , the complete list of POMDP solvers can be found here: https://github.com/JuliaPOMDP/POMDPs.jl#pomdp-solvers. I believe this is the breakdown for reward function support looks like this

Support R(s,a,s') Support only R(s, a) directly
Online BasicPOMCP AEMS
ARDESPOT
POMCPOW
Offline QMDP POMDPSolve
MCVI SARSOP
FIB IncrementalPruning

If you have further questions about specific solvers, feel free to ask on our forum: https://groups.google.com/forum/#!forum/pomdps-users

from ai-toolbox.

hifzajaved avatar hifzajaved commented on July 22, 2024

Thanks, @zsunberg !

from ai-toolbox.

Svalorzen avatar Svalorzen commented on July 22, 2024

@hifzajaved Btw, I've checked again and the the POMDP example rewards are defined as an SxAxS' matrix, so I'm still not sure where the problem is..

from ai-toolbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.