Hi all, Nice project. We want to start using it. After reading the d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This issue has been solved in <a class="issue-link js-issue-link" data-error-text="Fai

Discussion channel for how to apply self-play to custom env? about di-engine HOT 10 CLOSED

opendilab commented on July 21, 2024

Discussion channel for how to apply self-play to custom env?

from di-engine.

Comments (10)

liuyuisanai commented on July 21, 2024

@zxzzz0 You can ask our WeChat assistant for the group entry. Assistant WeChat ID: OpenDILab

from di-engine.

PaParaZz1 commented on July 21, 2024

Hi all,

Nice project. We want to start use it. After reading the doc and the config dizoo/competitive_rl/entry/cpong_dqn_default_config.py for league train, there are still something not clear to us. Do you have a channel that can discuss trivial questions frequently? Like a WeChat group or slack channel?

cc: [email protected]

Thank you for your participation. And you can add our WeChat assistant ID listed above. Our slack channel will also be available soon. To be specific, what kinds of training tasks you want to do? I guess you want to train a self-play league in cpong environment, and dizoo/competitive_rl/entry/cpong_dqn_default_config.py is just a agent VS bot training config. If you are interested in league training, we can discuss further.

from di-engine.

zxzzz0 commented on July 21, 2024

@PaParaZz1 This is the exact config file mentioned in the League Overview doc, where it said it's a demo of league.

If it's just an agent VS bot training config, it's not a correct config for self-play with league training. Please correct this config.

We are setting up the league training. Look forward to seeing the slack channel.

from di-engine.

PaParaZz1 commented on July 21, 2024

@PaParaZz1 This is the exact config file mentioned in the League Overview doc, where it said it's a demo of league.

If it's just an agent VS bot training config, it's not a correct config for self-play with league training. Please correct this config.

We are setting up the league training. Look forward to seeing the slack channel.

Sorry I misunderstood something. cpong_dqn_config is actually the demo config for league training. Our agent VS bot config is not upload in this version. And which parts of league training is not clear enough, such as how to define own player and apply it?

from di-engine.

zxzzz0 commented on July 21, 2024

Yes. We don't understand how to apply it.
To help other people understand this and hide unnecessary details, we wrote a simple two-player game environment below for demo purpose.

class Game:
    def reset(self):
        return [[0, 1], [1,0]] # trivial observation
    def step(step, actions):
        if actions == [0, 0]:
            rewards = -10, -10
        elif actions == [0, 1]:
            rewards = -1, +1
        elif actions == [1, 0]:
            rewards = +1, -1
        elif actions == [1, 1]:
            rewards = 0, 0
        observations = [[0, 1], [1,0]]
        dones = True, True
        infos = None, None
        return observations, rewards, dones, infos

This is a simple multi agent environment for 2 players. The game will be played for just 1 step and finish. Each player will pick one action at this step, action 0 or action 1 and they will receive rewards accordingly. Human can easily calculate the mixed strategy Nash equilibria result for this game. (select action 0 with prob 1/10 and action 1 with prob 9/10).

After reading the League Overview doc we are still not clear which files should be changed.
Code is worth a thousand words.
@PaParaZz1 Could you create a branch for this simple game as a demo for using opendilab's league training for custom environment? After you have the branch ready we will run the code and see if it can converge to the Nash equilibria mentioned above.

cc: @liuyuisanai for more inputs!

from di-engine.

zxzzz0 commented on July 21, 2024

We further added following 2 simple rule-based agents for evaluating the main agent during self-play.

class Agent1:
    def step(obs):
        return 0 # always return fixed action 0
class Agent2:
    def step(obs):
        # return actions uniformly
        return np.random.choice([0, 1], p=[0.5, 0.5])

Agent1 always pick one fixed action. (action 0 here). Agent2 randomly pick one action from the action set but the probability is not Nash equilibria.

If league training is going well, we expect to see the win rate between main agent against these two rule-based agents increase nearly monotonically. Please also include these two agents in your demo branch as evaluation is really an important part league training. We won't know that the main agent is indeed becoming stronger without a good set of metrics.

Feel free to let us if there are any questions. @PaParaZz1

from di-engine.

PaParaZz1 commented on July 21, 2024

@zxzzz0 https://join.slack.com/t/opendilab/shared_invite/zt-sqsd142v-N_l~EHLPYF1jr4c9PkqDuA You can join our slack team and discuss further

from di-engine.

PaParaZz1 commented on July 21, 2024

Anyone who interested in league training can pay attention to this branch(https://github.com/opendilab/DI-engine/tree/dev-league-demo) continually.

from di-engine.

PaParaZz1 commented on July 21, 2024

I have tested pass two groups game environments, and they both approximately converged to NE.

The first one: zero-sum

The second one: prisoner dilemma

from di-engine.

PaParaZz1 commented on July 21, 2024

This issue has been solved in #12

from di-engine.

Discussion channel for how to apply self-play to custom env? about di-engine HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent