What does prefetcher do?
They fetch data from rollout asynchronously
Where dose rollout combination come from?
They come from strategy planning, and the prso will calc the nash equabrillium
Is the asynchronsous data on-policy?
the psro_scheduler will generate training_desc which achieve nash equabrillium in former policy, if set share_policies to 1, will always set training agent to agent_0,and there is a random_permute to change agents poistion. So when things are unsymmtry it's not on-policy
What dose update_func do?
It collect data and calc payoff matrix