Comments (4)
I believe your issue is that rewards are being delayed so the agent is not getting useful real time feedback about whether the move it just made is helpful or not.
As such, I would try using the distance between the target and the AI as an immediate reward each episode, then another reward if they actually catch the target.
In your case, every move the agent makes can be considered an episode with a corresponding reward being the distance to target.
Once you get farther along, would you be open to sharing this as an example for other MindMaker users? I think it would be really useful.
I also do consulting work for these kind of applications, so you if you needed more hands on assistance, writing code or debugging stuff, you can reach out to me at [email protected]
from mindmaker.
Currently I do not delay the rewards. I reward the network with points for getting closer to the opponent immediatly after each action. Maybe I worded my question poorly. The problem is not that my rewards are delayed, but rather that they are not. I would like for the agent to plan over multiple actions rather than just optimizing a single action.
Maybe an example would help here. Let's say the agent recieves 1 point for getting closer to the opponent and 100 points for actually reaching him. The agent is now in a position where he can take 10 steps directly towards the opponent but then end up in a dead end and not be able to reach him. Or he can take 20 steps around an obstacle (where he recieves no points, since he is not closing in on the opponent) but then actually reach his target.
As far as I can tell the agent will learn that the first option is the best one, since he recieves his reward after each step. He would thus recieve 10 points in 10 actions and then have to back out of the dead end. If he would however recieve his points over 20 actions he would likely learn that the latter option is preferable.
The thing I am hoping for is some resemblance of human strategy. A human would plan to catch his opponent by maybe trapping him in a corner. He would see that just getting closer is not always the best plan, especially if the opponent is just as fast as the human.
And to do this I would like to experiment with longer episodes i.e. sending a reward to the network after X episodes. But if that is not possible with the current setup I will just keep on experimenting with a reward after each action and see where that gets me.
Concerning the sharing of my project I will have to check with my university. Since this is a thesis I might not have the right to do so. I'll get back to you.
Anyways, thanks for the response =)
from mindmaker.
Ok, I understand the issue now. You can change the reward frequency easily using a counter in the blueprints, where for instance the reward is set to zero(or what is being sent to the learning algorithm is set to zero) until the counter hits a certain number of episodes and then you averages all of the intervening rewards(distances from player) and send that as a single cumulative reward. This could be done over an arbitrarily long period. Iām not sure that simply changing the reward frequency will solve the issue though. I would look at some of the hide and seek type implementations others have used, especially if they are based on Open AI gym environment protocol which is what mindmaker uses.
Could be useful to look at this as a path planning problem as well and see how others have structured the reward signal, here are some sample implementations
https://github.com/naderAsadi/Optimal-Path-Planning-Deep-Reinforcement-Learning
https://www.sciencedirect.com/science/article/pii/S1877050918300553
from mindmaker.
Yeah that was sort of what I was doing for a while, but I was worried that setting the reward to 0 for so many episodes might screw with the results. But if you also think that's the way to go I'll just tinker with the rewards some more.
Thank you so much for the quick response and the further reading material. This will help a great deal.
from mindmaker.
Related Issues (20)
- Error when launching the example maps HOT 5
- Can the low and high ends of the observation shape vary?
- Unreal Engine crashes upon compiling or running anything involving the SocketIO client HOT 1
- How to properly reset my Agent when he does something fatal? HOT 5
- UE5: MindMaker (DL single node) gives (KeyError: '_init_setup_model') on play HOT 28
- Mindmaker is returning the same action for every receiveAction call during Evaluation phase. HOT 22
- TD3 Action noise
- How to integrate DRL into an already existing unreal 4.27 project HOT 1
- Terminating episode early HOT 4
- MARL disconnection from Unreal Engine after a while HOT 4
- ImportError: DLL load failed HOT 1
- Error training with PPO2 Full tensorbord logs = True HOT 1
- UE5 Plugin does not load learning engine HOT 5
- What does mindmaker showing the error while launching it? HOT 3
- PermissionError: [Errno 13] Permission denied: 'C:\\Users\\user\\AppData\\Roaming\\PPOCart' HOT 4
- Multi-Agent Setup for one network HOT 2
- How to include learned .pth files in a build instead of a PIE run
- problem building Socketio for UE 5.1
- problem "local variable 'policy_kwargsval' referenced before assignment"
- dont understand how the agent visualization works.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mindmaker.