Hello, I referred to your tiger_door project to create my own POMDP model that inv

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

model.getRewardFunction() about ai-toolbox HOT 6 CLOSED

svalorzen commented on July 22, 2024

model.getRewardFunction()

from ai-toolbox.

Comments (6)

zsunberg commented on July 22, 2024

I'm not familiar with this software, but we have often come across this issue in developing POMDPs.jl. In that software, the reward is defined in terms of the current state, action, and next state (s, a, and s'), but some solvers require a reward function that's only dependent on s and a, just like you describe.

I think that if you just define the s-a reward function as the expectation of the s-a-s' reward function, that is

most algorithms should find a solution that is equivalent to the case where you defined R(s,a,s').

from ai-toolbox.

Svalorzen commented on July 22, 2024

You can very easily define your own reward matrices SxAxS', such that for each pair S-S' the reward is the same independently of the action (and put it in the model using this function).

The code for MDP::Model internally simplifies such a matrix to an equivalent SxA matrix, using the equivalence @zsunberg mentioned. This is done mainly to reduce the amount of computation to be done later on, since it is a lossless operation. SxAxS' can't however be reduced to an SxS' in general without losing information.

Note that this transformation is not required by any algorithm. If the SxA matrix is not available (as in, there is no getRewardFunction() which returns a 2D Eigen matrix in your particular model class), all algorithms will simply iterate over all SxAxS' combinations using the getExpectedReward(size_t, size_t, size_t) function (see for example this function).

This is what the code does for example when using this older MDP model class which I keep to test such functionality. In any case remember that you are always free to implement your own classes and use them freely as the algorithms are templated - as long as they satisfy the basic interfaces everything will work out.

from ai-toolbox.

hifzajaved commented on July 22, 2024

I'm not familiar with this software, but we have often come across this issue in developing POMDPs.jl. In that software, the reward is defined in terms of the current state, action, and next state (s, a, and s'), but some solvers require a reward function that's only dependent on s and a, just like you describe.

I think that if you just define the s-a reward function as the expectation of the s-a-s' reward function, that is

most algorithms should find a solution that is equivalent to the case where you defined R(s,a,s').

Thanks for your response.
Can you tell me which solvers in your library (offline and online) accept the s-a-s' reward function?

from ai-toolbox.

zsunberg commented on July 22, 2024

Hi @hifzajaved , the complete list of POMDP solvers can be found here: https://github.com/JuliaPOMDP/POMDPs.jl#pomdp-solvers. I believe this is the breakdown for reward function support looks like this

	Support R(s,a,s')	Support only R(s, a) directly
Online	BasicPOMCP	AEMS
	ARDESPOT
	POMCPOW
Offline	QMDP	POMDPSolve
	MCVI	SARSOP
	FIB	IncrementalPruning

If you have further questions about specific solvers, feel free to ask on our forum: https://groups.google.com/forum/#!forum/pomdps-users

from ai-toolbox.

hifzajaved commented on July 22, 2024

Thanks, @zsunberg !

from ai-toolbox.

Svalorzen commented on July 22, 2024

@hifzajaved Btw, I've checked again and the the POMDP example rewards are defined as an SxAxS' matrix, so I'm still not sure where the problem is..

from ai-toolbox.

model.getRewardFunction() about ai-toolbox HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent