Hello, I have a use case that might be of interest to you. My se

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Delegating ActionNode construction in MCTS/POMCP,about svalorzen/ai-toolbox

Comments (16)

Svalorzen commented on July 3, 2024

Mmmh.. I think this could be implemented nicely in a new separate MCTS implementation in the Factored::MDP part of the library. Not sure if you have ever looked at it; there state and actions are represented as std::vector<size_t> (or even just one or the other depending on the algo), so maybe you could use those for your representation.

In general, I'm not too keen to too much customizability directly in the library, as it results in stuff that is harder to digest for new users (at least in my opinion). Once one knows the code, they can directly patch it, as I'm happy to learn you have been doing :).
However, in a factored state/action space context, the implementation of MCTS should be different enough to warrant making a separate class for it anyway, so I approve the idea!

I'm curious tho: having a map only makes sense if you plan to only access part of the actions. How would you then make MCTS explore the tree? UCB usually tries each action at least once, otherwise you have no guarantees, so we should use something else?

About reusage, I try to make each method kind of readable by itself without too much jumping around to help people that are interested in only parts of the library. While POMCP and MCTS are indeed quite similar, I don't think it would be easy to extract functionality out of them without leaving the final implementation look like a weird puzzle. In any case, I would first implement this new MCTS version, and then see exactly what is there that we can share easily.

I'm probably going to have my hands full until March as well, but feel free to elaborate on your ideas here so we can decide exactly what to do!

from ai-toolbox.

EHadoux commented on July 3, 2024

I'll answer first and give you more concrete details afterwards because I feel that I was not clear enough.

Mmmh.. I think this could be implemented nicely in a new separate MCTS implementation in the Factored::MDP part of the library. Not sure if you have ever looked at it; there state and actions are represented as std::vector<size_t> (or even just one or the other depending on the algo), so maybe you could use those for your representation.

I need to have a look at that a bit more but I don't think I really have a Factored MDP here. It's more an alternative representation of the states and actions but they are not like an aggregation of heterogeneous pieces.

In general, I'm not too keen to too much customizability directly in the library, as it results in stuff that is harder to digest for new users (at least in my opinion). Once one knows the code, they can directly patch it, as I'm happy to learn you have been doing :).

I think you are right.

However, in a factored state/action space context, the implementation of MCTS should be different enough to warrant making a separate class for it anyway, so I approve the idea!

Good, actually what can be done (it will kind of answer your previous point as well) would be to have a base MCTS class with templates, but in a part of the library that only people with precise use cases would ever see, and then the current MCTS and POMCP would be specialisations of this class, in parts that are easier to access for people who only want to use the standard algorithms.

I'm curious tho: having a map only makes sense if you plan to only access part of the actions. How would you then make MCTS explore the tree? UCB usually tries each action at least once, otherwise you have no guarantees, so we should use something else?

See below, it's not the actions that are maps but rather that all the children actions of a state are held in a map instead of in a vector.

About reusage, I try to make each method kind of readable by itself without too much jumping around to help people that are interested in only parts of the library. While POMCP and MCTS are indeed quite similar, I don't think it would be easy to extract functionality out of them without leaving the final implementation look like a weird puzzle. In any case, I would first implement this new MCTS version, and then see exactly what is there that we can share easily.

This is wise. Striking the right balance between no repetition and clarity in the code is quite hard to achieve. That being said, the fact that you had to put your fix for the termination of the simulations in the rollout in both files is error prone.

I'm probably going to have my hands full until March as well, but feel free to elaborate on your ideas here so we can decide exactly what to do!

Brilliant, so here is my use case more precisely:

I have a graph of arguments (https://en.wikipedia.org/wiki/Argumentation_framework) that represents all the arguments available to two players in a dialogue
A state is the set of all the arguments played so far in the dialogue by both players
An action is the set of the arguments played in the next move by only one of the players (always the same player)

So basically, to represent an action, I have a bitset where bit 0 is whether or not argument 0 will be played this turn, bitset 1 argument 1 and so on. Because all the arguments cannot be played in all the states (arguments already played, arguments unrelated or irrelevant in this state, etc.) it's suboptimal to have every single action as children of the state in the tree. In fact, when we have more than 128 arguments, the bitset has to be a bitset, it can't be transformed back into an unsigned int. It would also be a mess to keep a vector to hold them but map back the indexes to different actions each time depending on which ones are allowed in this or this state.
That's why, I have changed in MCTS.hpp:

line 61 to using ActionNodes = std::map<boost::bitset,ActionNode>;
both resize at line 187 and 210 to a bit of code initialising with the allowed actions in the particular state
line 54 size_t to boost::dynamic_bitset to account for the change of representation
And obviously all the signatures when needed.

I hope this is clearer now.
So the proposition would be:

to have a MCTS_base<container, index> where the original MCTS would be MCTS : MCTS_base<std::vector, size_t>
to replace children[a] by children.at(a) to be able to use any std container
to replace the graph_.children.resize(A); by a function like construct_children(graph) to be able to extend the original MCTS or POMCP with extension of ActionNode (note that in the previous point we use the same ActionNode struct) by just overriding it. In the default implementation it would be graph_.children.resize(A); but it could be overridden to construct subtypes of ActionNode (I don't think we can do that currently without basically copy-paste-modify everything)

Alternatively everything can be just extended/inherited/overridden but the purpose of the templating is to avoid having the ActionNodes member of the base class useless and hanging there empty in the subclasses because it's not the right container.

I hope all of this makes sense.
By the way, earlier when writing the first post, Github told me that my last interaction with the project was 2014, I am glad you managed to keep this up to date for so long, that's impressive.

from ai-toolbox.

Svalorzen commented on July 3, 2024

I need to have a look at that a bit more but I don't think I really have a Factored MDP here. It's more an alternative representation of the states and actions but they are not like an aggregation of heterogeneous pieces.

The Factored::MDP namespace is for all MDPs where states/actions are represented in vector form, so in your case I think this applies. For example, to represent your action space I would use an Actions A{2, 2, 2, ...., 2}, since for each argument you have the binary choice of having it or not. Still, probably using bitsets in your case is more efficient, so feel free to keep using them :)

Good, actually what can be done (it will kind of answer your previous point as well) would be to have a base MCTS class with templates, but in a part of the library that only people with precise use cases would ever see, and then the current MCTS and POMCP would be specialisations of this class, in parts that are easier to access for people who only want to use the standard algorithms.

100% agreed.

This is wise. Striking the right balance between no repetition and clarity in the code is quite hard to achieve. That being said, the fact that you had to put your fix for the termination of the simulations in the rollout in both files is error prone.

This is also true, and I admit that I have no real rules about it. My general idea I guess is that as long as something is in two/three places then it doesn't have to necessarily be abstracted. However, I got some ideas of things we could extract here that would make sense.

That's why, I have changed in MCTS.hpp:

line 61 to using ActionNodes = std::mapboost::bitset,ActionNode;

both resize at line 187 and 210 to a bit of code initialising with the allowed actions in the particular state

line 54 size_t to boost::dynamic_bitset to account for the change of representation
And obviously all the signatures when needed.

to have a MCTS_base<container, index> where the original MCTS would be MCTS : MCTS_base<std::vector, size_t>

I think we can avoid using a map for the ActionNodes, and I'll explain why and how. Ideally using a vector for the ActionNodes would be better if possible, as you need to iterate a lot over the actions to select the optimal one, and for this std::vector > std::unordered_map. Additionally, the main performance problem with MCTS as of now are memory allocations: in order to create and shuffle all the nodes all the time there are tons of implied allocations for the vectors and maps. Allocating a vector is a single allocation, while with a map every time you reference a new node you need to do a separate alloc, which would also slow things down quite a bit. It's not my main priority, but if we can avoid it it'd be nice.

So first, things to extract from MCTS: I think rollout could work fine as a free function in the MDP/Utils.hpp file. Also, I think we could extract the findBestBonusA into a separate UCB class. And here's the trick: just as you said, we need an entry point in the class, a construct_children function. My idea would be to templatize the class using MCTS<State, Action, Selector> (with default MCTS<size_t, size_t, UCB>, where selector would be the class that both initializes and selects between children (in the default case UCB). At the same time, we modify ActionNode to contain a field action which contains the "true" representation of the state of the ActionNode, to pass to the model for sampling.

So at line 231, the code would look like:

sn.N++;

const auto aIt = Selector(model_, sn.children, sn.N);
const size_t a = std::distance(std::begin(sn.children), aIt);

auto [s1, rew] = model_.sampleSR(s, aIt->action);

where Selector in the case of UCB would look something like:

if (children.size() == 0) {
    children.resize(model.getA());
    for (size_t a = 0; a < model.getA(); ++a) children[a].action = a;
}

// rest of UCB...

In your case, I assume that for each state you still know how many actions and which actions are available. So you'd need to modify the UCB init part to something like:

if (children.size() == 0) {
    auto availableActions = model.getAvailableActions(s);
    children.resize(availableActions.size());
    for (size_t a = 0; a < availableActions.size(); ++a)
        children[a].action = availableActions[a]; // Set bitset for this child.
}

// rest of UCB...

In this way MCTS internals do not change with templating, we can iterate over actions fast to keep MCTS fast, we need fewer allocations, we gain the ability to handle variable action spaces, but also to change the exploration algorithm. We need to define exactly what we need to pass to the Selector, but I guess the children + a model reference would suffice.

What do you think?

from ai-toolbox.

EHadoux commented on July 3, 2024

The Factored::MDP namespace is for all MDPs where states/actions are represented in vector form, so in your case I think this applies. For example, to represent your action space I would use an Actions A{2, 2, 2, ...., 2}, since for each argument you have the binary choice of having it or not. Still, probably using bitsets in your case is more efficient, so feel free to keep using them :)

Neat, I'll have look at that. It's true that boost:bitsets have some convenient helpers (like directly go to the first 1 in the list) but nothing that can't be done with a few helper methods.

What do you think?

I think your point on vectors is good. I didn't want to change the struct ActionNode so I changed the container but putting the actual action in the node in order to keep a vector is definitely the best idea.
Having the initialization in UCB feels a bit counterintuitive though because basically, it's supposed to work on anything as long as it has a value. So it shouldn't care whether the action space in a state is sparse or not, as long as it has the vector to iterate on.

What do you think about changing neither MCTS nor UCB but having the template like MCTS<State, Action> with MCTS<StateNode, ActionNode> as default?
StateNode and ActionNode become extendable classes, where the resize and children[a]/children.at(a) can be defined according to the type needed for the actual action (size_t, bitset, etc.).
So children.resize(model.getA()); and children.resize(availableActions.size()); would become node.resize_children(model).

You have the last call anyway, so I am good with your solution as well because I don't know what is good between modifying the internals of MCTS (StateNode, ActionNode) or the selection algorithm (UCB). They both feel hard to do for a newcomer anyway (or at least to conceptualise because the actual changes are minimal in both cases).

from ai-toolbox.

Svalorzen commented on July 3, 2024

I think template parameters for the structs themselves is suboptimal for a couple of reason.

First the structures define the tree structures of MCTS, and are thus a pretty heavy part of its "definition". If you remove them, a reader of the class would need to magically infer what they looked like just by reading the code, which I believe would be very difficult. They would need to fist look for an already done implementation, see how the structures are supposed to look like, and then go back to reading the code.

Second, redefinition of the StateNode is pretty much always redundant, since what we are doing does not change how StateNodes work. So either you force people to redefine StateNode always in the exact same way (barring the new type of ActionNode that goes in), or you change to MCTS<State, ActionNode>, which is somewhat ugly.

Another is that by not extracting UCB, you remove another opportunity for duplication, since UCB is exactly the same between MCTS and POMCP - and we started by saying that this was one of the goals. While if you extract the nodes, they look different from MCTS and POMCP, so you haven't really improved much there - you can't reuse them between classes anyway.

I agree that having the initialization in the UCB class is also not the prettiest solution. At the same time, a function that initializes the vector in some sort of way must be passed in, and if UCB becomes a template parameter, at that point might as well make it do the work (maybe by having it have an initialize function and a findBestAction function and call those in MCTS).

If we felt really into the refactoring, we could even move the value update parts into UCB (the aNode.V += ( rew - aNode.V ) / static_cast<double>(aNode.N);), and make everything configurable. But for now I'd stop here; there's always time to do this stuff if really needed.

from ai-toolbox.

Svalorzen commented on July 3, 2024

Ah, and I forgot:

By the way, earlier when writing the first post, Github told me that my last interaction with the project was 2014, I am glad you managed to keep this up to date for so long, that's impressive.

Thanks a lot for the encouragement, it's really really appreciated. I don't plan to stop anytime soon, so hopefully the library will continue to expand. Also knowing that there's people out there that actually use my code makes me feel really good about it :D

from ai-toolbox.

EHadoux commented on July 3, 2024

I am starting to think about all of that. I'll properly start some coding in a couple of weeks. In the mean time, I am trying to wrap my head around the Factored MDP part, for my use case.
Also, I've seen that MCTS takes a generative_model meaning that the model has to implement getS() and getA()which is a real problem for me as they can't be kept in size_ts (hence the dynamic_bitset). I can have this getAvailableActionSize()business though.
But maybe I have missed something on the Factored MDPs that allows me to skip these two functions.

Cheers

from ai-toolbox.

Svalorzen commented on July 3, 2024

The Factored MDP part is somewhat of a work in progress, and it's a bit less settled than the MDP and POMDP part. Also, MCTS as is now only accepts normal MDPs, so I don't think it would currently help you.

An MCTS for a Factored MDP would still require a generative model with the same API; the only difference is that getS() and getA() in a Factored MDP return vectors, so in your case for example getA() would return your Actions {2, 2, 2, 2, 2..., 2} (since you have a long action made of binary values).

If you have specific questions about the code and stuff is not clear, feel free to open new issues about specific things you don't understand, and I'll happily try to add more docs where needed. In the meantime, maybe try to give a look at this file and, maybe, this class (although this last one is unlikely to help you as-is, but can give you an idea of what the methods for a generative factored MDP class will look like).

from ai-toolbox.

Svalorzen commented on July 3, 2024

Hey @EHadoux, have you been able to start working on this? Let me know if you need any help :)

from ai-toolbox.

EHadoux commented on July 3, 2024

Hey, sorry for the delay. I'm actually preparing my code that is using all of that so I'll be able to test in real life conditions. I'm going to start the actual modifications on the library real soon.
I only really need the MCTS classes in my code, so I'm still wondering whether I should just copy it over instead of using the library as a dependancy given that I don't want to install the LP package all that stuff.

from ai-toolbox.

Svalorzen commented on July 3, 2024

Hey @EHadoux, any news on this?

from ai-toolbox.

EHadoux commented on July 3, 2024

@Svalorzen really sorry about the delay. I had a lot to take care around all of that to be sure I will exploit what we will be doing there at it's maximum.
I am forking the project right now and will work on this for the next few days-a week. I will open a pull request quite soon for you to be able to see if it's going to the right direction, even though the commits might now even compile, just to be sure I don't have a lot to undo if you don't like the interfaces.

from ai-toolbox.

EHadoux commented on July 3, 2024

Do you think we can add flags and tests in the CMakeLists.txt to remove the hard requirements for Eigen and LPSolve? I don't think they are used if I stick to MCTS. Just a -DNO_EIGEN=1 -DNO_LPSOLVE=1 with a default at both with 0 so the behaviour is not changed for most people.
With that we can just compile everything that is not relying on that. The final static lib will be smaller as well. I'll make a PR for that first if you want to see it.

Actually, MCTS is header only, so I can remove everything that is not used for its testing on my side and commit only the header if the test is passing.

from ai-toolbox.

Svalorzen commented on July 3, 2024

Cool, thanks for the work :)

I thought about adding switches to build only parts of the library that don't depend on external parties, but I decided against it. The reason is that if somebody really needs only some part of the library they can probably easily pull it out, but I find hard to determine a use-case where one would only want to build stuff that does not depend on third parties. Either you want to play with the full thing or you don't I guess :P

There's also the "more switches, more maintenance factor" which also didn't really help to convince me to do it.

from ai-toolbox.

Svalorzen commented on July 3, 2024

So, it has been a long time, and I'm finally ready to tackle this.

I have given a new look at the state of the PR, and while the work in there is impressive, it was going in the direction I was trying to avoid, specifically having a ton of customization points.

In particular, the current PR has a template argument, Sel, that is basically a visitor and is supposed to implement a million methods to satisfy all the possible way an user might want to use MCTS. This is not good as it makes MCTS essentially a skeleton class that does very little aside from its main loop, and all the interesting points are injected via Sel.

In particular, the main problems we have are:

Having states not bound by size_t for extremely large state-spaces.
Possibly having actions not bound by size_t.
Possibly having variable number of actions.
If actions are not size_t, we must have a way to know how many we need for any state, and specify a way for UCT to compare them and select the best.

This last point is one of the most annoying, because it basically requires a customization point, with no workaround.

Thus, I've decided the following:

Having states not bound by size_t should be the simplest change.

I'm not going to simply use the state type in the StateNodes map though, as if they were vectors the map would keep all of them around in the keys, which would increase the number of required allocations significantly. Instead we hash the state in the code by ourselves, and keep the map working with size_t keys. The probability that this will cause collision problems is quite small, and performance should be much better.

Having actions not bound by size_t is not really feasible.

Keep in mind that to make any sense, UCT needs to work with a limited set of actions. Thus, no matter their type, there can't be a huge number of actions otherwise MCTS will not work. In this way, we can simply leave the task of converting from whatever action format the user has to size_t with no real loss, which simplifies a lot of our work.

Variable number of actions should also be reasonably simple.

The only thing this requires is that when we do a resize, we need to poll the model for how many actions a given state has. We can simply require a method getA(size_t s), rather than the usual getA(), and resize each node accordinly.

Since then the number of actions at each node is bounded and known, UCT can easily compare and select and do its work.

I already have a basic implementation of all of this which does not require massive changes to the code base. I now "only" have to test it :P

from ai-toolbox.

Svalorzen commented on July 3, 2024

It's finally done! You can find the new code in master: c398fd7

I've also added a small test in MCTSTests.cpp with a variable-action, non-size_t state-space model. It seems to work, so I am happy :)

I can finally close your pull request and this issue. @EHadoux Thanks again a ton for the effort, it really helped me figure out the best way to approach the problem!

from ai-toolbox.

Delegating ActionNode construction in MCTS/POMCP about ai-toolbox HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent