The db-football from quangr

View Code? Open in Web Editor NEW

A Simple, Distributed and Asynchronous Multi-Agent Reinforcement Learning Framework for Google Research Football AI.

License: Other

Shell 0.31% Python 99.69%

db-football's People

Contributors

Why does Gpu not working and the meaning reward keep upgoing?
Beacuse the rollout_metric_cfgs.reward.init_list is seted to low.
How does two agents' data trained as a share network
The collected data will be split inevidually and fallen when training

DB-Football/light_malib/training/data_generator.py

Line 111 in d5ae999

batch[k] = batch[k].reshape(-1, *batch[k].shape[3:])
What does prefetcher do?
They fetch data from rollout asynchronously
Where dose rollout combination come from?
They come from strategy planning, and the prso will calc the nash equabrillium
Is the asynchronsous data on-policy?
the psro_scheduler will generate training_desc which achieve nash equabrillium in former policy, if set share_policies to 1, will always set training agent to agent_0，and there is a random_permute to change agents poistion. So when things are unsymmtry it's not on-policy
What dose update_func do?
It collect data and calc payoff matrix

Recommend Projects