Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Increasing the length of episodes about mindmaker HOT 4 CLOSED

krumiaa commented on July 19, 2024

Increasing the length of episodes

from mindmaker.

Comments (4)

krumiaa commented on July 19, 2024

I believe your issue is that rewards are being delayed so the agent is not getting useful real time feedback about whether the move it just made is helpful or not.
As such, I would try using the distance between the target and the AI as an immediate reward each episode, then another reward if they actually catch the target.
In your case, every move the agent makes can be considered an episode with a corresponding reward being the distance to target.
Once you get farther along, would you be open to sharing this as an example for other MindMaker users? I think it would be really useful.
I also do consulting work for these kind of applications, so you if you needed more hands on assistance, writing code or debugging stuff, you can reach out to me at [email protected]

from mindmaker.

EmberAmbassador commented on July 19, 2024

Currently I do not delay the rewards. I reward the network with points for getting closer to the opponent immediatly after each action. Maybe I worded my question poorly. The problem is not that my rewards are delayed, but rather that they are not. I would like for the agent to plan over multiple actions rather than just optimizing a single action.

Maybe an example would help here. Let's say the agent recieves 1 point for getting closer to the opponent and 100 points for actually reaching him. The agent is now in a position where he can take 10 steps directly towards the opponent but then end up in a dead end and not be able to reach him. Or he can take 20 steps around an obstacle (where he recieves no points, since he is not closing in on the opponent) but then actually reach his target.
As far as I can tell the agent will learn that the first option is the best one, since he recieves his reward after each step. He would thus recieve 10 points in 10 actions and then have to back out of the dead end. If he would however recieve his points over 20 actions he would likely learn that the latter option is preferable.

The thing I am hoping for is some resemblance of human strategy. A human would plan to catch his opponent by maybe trapping him in a corner. He would see that just getting closer is not always the best plan, especially if the opponent is just as fast as the human.
And to do this I would like to experiment with longer episodes i.e. sending a reward to the network after X episodes. But if that is not possible with the current setup I will just keep on experimenting with a reward after each action and see where that gets me.

Concerning the sharing of my project I will have to check with my university. Since this is a thesis I might not have the right to do so. I'll get back to you.

Anyways, thanks for the response =)

from mindmaker.

krumiaa commented on July 19, 2024

Ok, I understand the issue now. You can change the reward frequency easily using a counter in the blueprints, where for instance the reward is set to zero(or what is being sent to the learning algorithm is set to zero) until the counter hits a certain number of episodes and then you averages all of the intervening rewards(distances from player) and send that as a single cumulative reward. This could be done over an arbitrarily long period. I’m not sure that simply changing the reward frequency will solve the issue though. I would look at some of the hide and seek type implementations others have used, especially if they are based on Open AI gym environment protocol which is what mindmaker uses.

Could be useful to look at this as a path planning problem as well and see how others have structured the reward signal, here are some sample implementations

https://github.com/naderAsadi/Optimal-Path-Planning-Deep-Reinforcement-Learning
https://www.sciencedirect.com/science/article/pii/S1877050918300553

from mindmaker.

EmberAmbassador commented on July 19, 2024

Yeah that was sort of what I was doing for a while, but I was worried that setting the reward to 0 for so many episodes might screw with the results. But if you also think that's the way to go I'll just tinker with the rewards some more.
Thank you so much for the quick response and the further reading material. This will help a great deal.

from mindmaker.

Increasing the length of episodes about mindmaker HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent