Comments (6)
I'm not familiar with this software, but we have often come across this issue in developing POMDPs.jl. In that software, the reward is defined in terms of the current state, action, and next state (s, a, and s'), but some solvers require a reward function that's only dependent on s and a, just like you describe.
I think that if you just define the s-a reward function as the expectation of the s-a-s' reward function, that is
most algorithms should find a solution that is equivalent to the case where you defined R(s,a,s').
from ai-toolbox.
You can very easily define your own reward matrices SxAxS', such that for each pair S-S' the reward is the same independently of the action (and put it in the model using this function).
The code for MDP::Model
internally simplifies such a matrix to an equivalent SxA matrix, using the equivalence @zsunberg mentioned. This is done mainly to reduce the amount of computation to be done later on, since it is a lossless operation. SxAxS' can't however be reduced to an SxS' in general without losing information.
Note that this transformation is not required by any algorithm. If the SxA matrix is not available (as in, there is no getRewardFunction()
which returns a 2D Eigen matrix in your particular model class), all algorithms will simply iterate over all SxAxS' combinations using the getExpectedReward(size_t, size_t, size_t)
function (see for example this function).
This is what the code does for example when using this older MDP model class which I keep to test such functionality. In any case remember that you are always free to implement your own classes and use them freely as the algorithms are templated - as long as they satisfy the basic interfaces everything will work out.
from ai-toolbox.
I'm not familiar with this software, but we have often come across this issue in developing POMDPs.jl. In that software, the reward is defined in terms of the current state, action, and next state (s, a, and s'), but some solvers require a reward function that's only dependent on s and a, just like you describe.
I think that if you just define the s-a reward function as the expectation of the s-a-s' reward function, that is
most algorithms should find a solution that is equivalent to the case where you defined R(s,a,s').
Thanks for your response.
Can you tell me which solvers in your library (offline and online) accept the s-a-s' reward function?
from ai-toolbox.
Hi @hifzajaved , the complete list of POMDP solvers can be found here: https://github.com/JuliaPOMDP/POMDPs.jl#pomdp-solvers. I believe this is the breakdown for reward function support looks like this
Support R(s,a,s') | Support only R(s, a) directly | |
---|---|---|
Online | BasicPOMCP | AEMS |
ARDESPOT | ||
POMCPOW | ||
Offline | QMDP | POMDPSolve |
MCVI | SARSOP | |
FIB | IncrementalPruning |
If you have further questions about specific solvers, feel free to ask on our forum: https://groups.google.com/forum/#!forum/pomdps-users
from ai-toolbox.
Thanks, @zsunberg !
from ai-toolbox.
@hifzajaved Btw, I've checked again and the the POMDP example rewards are defined as an SxAxS' matrix, so I'm still not sure where the problem is..
from ai-toolbox.
Related Issues (20)
- Python module problem HOT 26
- Using the Toolbox to solve the Tag problem HOT 4
- Save POMDP policy HOT 6
- Sparse Matrix's on POMDP model HOT 22
- Improve serialization for MDP and POMDP Sparse Models HOT 1
- Problem with make HOT 5
- Make cannot find .hpp file in Boost HOT 2
- Errors when compiling tutorials HOT 5
- Can't find Lpsolve when run cmake HOT 9
- Using AI-Toolbox with OMNeT++ HOT 2
- Make issue HOT 9
- Not able to find "compare" file HOT 3
- Trouble installing and running AIToolbox HOT 9
- about LP_Solve HOT 13
- error C3779: 'AIToolbox::IndexMapIterator<IdsIterator,Container>::operator *': a function that returns 'auto' cannot be used before it is defined HOT 11
- Better Project Setup Tutorial HOT 6
- C++: error: expected โ)โ before โelementsNโ HOT 10
- Problems when building the library HOT 4
- Sparse Matrix: Space Time Complexity while assigning and accessing an element. HOT 1
- Can't build because of lpslove HOT 16
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ai-toolbox.