Code Monkey home page Code Monkey logo

Comments (4)

krumiaa avatar krumiaa commented on July 19, 2024

I believe your issue is that rewards are being delayed so the agent is not getting useful real time feedback about whether the move it just made is helpful or not.
As such, I would try using the distance between the target and the AI as an immediate reward each episode, then another reward if they actually catch the target.
In your case, every move the agent makes can be considered an episode with a corresponding reward being the distance to target.
Once you get farther along, would you be open to sharing this as an example for other MindMaker users? I think it would be really useful.
I also do consulting work for these kind of applications, so you if you needed more hands on assistance, writing code or debugging stuff, you can reach out to me at [email protected]

from mindmaker.

EmberAmbassador avatar EmberAmbassador commented on July 19, 2024

Currently I do not delay the rewards. I reward the network with points for getting closer to the opponent immediatly after each action. Maybe I worded my question poorly. The problem is not that my rewards are delayed, but rather that they are not. I would like for the agent to plan over multiple actions rather than just optimizing a single action.

Maybe an example would help here. Let's say the agent recieves 1 point for getting closer to the opponent and 100 points for actually reaching him. The agent is now in a position where he can take 10 steps directly towards the opponent but then end up in a dead end and not be able to reach him. Or he can take 20 steps around an obstacle (where he recieves no points, since he is not closing in on the opponent) but then actually reach his target.
As far as I can tell the agent will learn that the first option is the best one, since he recieves his reward after each step. He would thus recieve 10 points in 10 actions and then have to back out of the dead end. If he would however recieve his points over 20 actions he would likely learn that the latter option is preferable.

The thing I am hoping for is some resemblance of human strategy. A human would plan to catch his opponent by maybe trapping him in a corner. He would see that just getting closer is not always the best plan, especially if the opponent is just as fast as the human.
And to do this I would like to experiment with longer episodes i.e. sending a reward to the network after X episodes. But if that is not possible with the current setup I will just keep on experimenting with a reward after each action and see where that gets me.

Concerning the sharing of my project I will have to check with my university. Since this is a thesis I might not have the right to do so. I'll get back to you.

Anyways, thanks for the response =)

from mindmaker.

krumiaa avatar krumiaa commented on July 19, 2024

Ok, I understand the issue now. You can change the reward frequency easily using a counter in the blueprints, where for instance the reward is set to zero(or what is being sent to the learning algorithm is set to zero) until the counter hits a certain number of episodes and then you averages all of the intervening rewards(distances from player) and send that as a single cumulative reward. This could be done over an arbitrarily long period. Iā€™m not sure that simply changing the reward frequency will solve the issue though. I would look at some of the hide and seek type implementations others have used, especially if they are based on Open AI gym environment protocol which is what mindmaker uses.

Could be useful to look at this as a path planning problem as well and see how others have structured the reward signal, here are some sample implementations

https://github.com/naderAsadi/Optimal-Path-Planning-Deep-Reinforcement-Learning
https://www.sciencedirect.com/science/article/pii/S1877050918300553

from mindmaker.

EmberAmbassador avatar EmberAmbassador commented on July 19, 2024

Yeah that was sort of what I was doing for a while, but I was worried that setting the reward to 0 for so many episodes might screw with the results. But if you also think that's the way to go I'll just tinker with the rewards some more.
Thank you so much for the quick response and the further reading material. This will help a great deal.

from mindmaker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.