Code Monkey home page Code Monkey logo

Comments (10)

liuyuisanai avatar liuyuisanai commented on July 21, 2024

@zxzzz0 You can ask our WeChat assistant for the group entry. Assistant WeChat ID: OpenDILab

from di-engine.

PaParaZz1 avatar PaParaZz1 commented on July 21, 2024

Hi all,

Nice project. We want to start use it. After reading the doc and the config dizoo/competitive_rl/entry/cpong_dqn_default_config.py for league train, there are still something not clear to us. Do you have a channel that can discuss trivial questions frequently? Like a WeChat group or slack channel?

cc: [email protected]

Thank you for your participation. And you can add our WeChat assistant ID listed above. Our slack channel will also be available soon. To be specific, what kinds of training tasks you want to do? I guess you want to train a self-play league in cpong environment, and dizoo/competitive_rl/entry/cpong_dqn_default_config.py is just a agent VS bot training config. If you are interested in league training, we can discuss further.

from di-engine.

zxzzz0 avatar zxzzz0 commented on July 21, 2024

@PaParaZz1 This is the exact config file mentioned in the League Overview doc, where it said it's a demo of league.

If it's just an agent VS bot training config, it's not a correct config for self-play with league training. Please correct this config.

We are setting up the league training. Look forward to seeing the slack channel.

from di-engine.

PaParaZz1 avatar PaParaZz1 commented on July 21, 2024

@PaParaZz1 This is the exact config file mentioned in the League Overview doc, where it said it's a demo of league.

If it's just an agent VS bot training config, it's not a correct config for self-play with league training. Please correct this config.

We are setting up the league training. Look forward to seeing the slack channel.

Sorry I misunderstood something. cpong_dqn_config is actually the demo config for league training. Our agent VS bot config is not upload in this version. And which parts of league training is not clear enough, such as how to define own player and apply it?

from di-engine.

zxzzz0 avatar zxzzz0 commented on July 21, 2024

Yes. We don't understand how to apply it.
To help other people understand this and hide unnecessary details, we wrote a simple two-player game environment below for demo purpose.

class Game:
    def reset(self):
        return [[0, 1], [1,0]] # trivial observation
    def step(step, actions):
        if actions == [0, 0]:
            rewards = -10, -10
        elif actions == [0, 1]:
            rewards = -1, +1
        elif actions == [1, 0]:
            rewards = +1, -1
        elif actions == [1, 1]:
            rewards = 0, 0
        observations = [[0, 1], [1,0]]
        dones = True, True
        infos = None, None
        return observations, rewards, dones, infos

This is a simple multi agent environment for 2 players. The game will be played for just 1 step and finish. Each player will pick one action at this step, action 0 or action 1 and they will receive rewards accordingly. Human can easily calculate the mixed strategy Nash equilibria result for this game. (select action 0 with prob 1/10 and action 1 with prob 9/10).

After reading the League Overview doc we are still not clear which files should be changed.
Code is worth a thousand words.
@PaParaZz1 Could you create a branch for this simple game as a demo for using opendilab's league training for custom environment? After you have the branch ready we will run the code and see if it can converge to the Nash equilibria mentioned above.

cc: @liuyuisanai for more inputs!

from di-engine.

zxzzz0 avatar zxzzz0 commented on July 21, 2024

We further added following 2 simple rule-based agents for evaluating the main agent during self-play.

class Agent1:
    def step(obs):
        return 0 # always return fixed action 0
class Agent2:
    def step(obs):
        # return actions uniformly
        return np.random.choice([0, 1], p=[0.5, 0.5]) 

Agent1 always pick one fixed action. (action 0 here). Agent2 randomly pick one action from the action set but the probability is not Nash equilibria.

If league training is going well, we expect to see the win rate between main agent against these two rule-based agents increase nearly monotonically. Please also include these two agents in your demo branch as evaluation is really an important part league training. We won't know that the main agent is indeed becoming stronger without a good set of metrics.

Feel free to let us if there are any questions. @PaParaZz1

from di-engine.

PaParaZz1 avatar PaParaZz1 commented on July 21, 2024

@zxzzz0 https://join.slack.com/t/opendilab/shared_invite/zt-sqsd142v-N_l~EHLPYF1jr4c9PkqDuA You can join our slack team and discuss further

from di-engine.

PaParaZz1 avatar PaParaZz1 commented on July 21, 2024

Anyone who interested in league training can pay attention to this branch(https://github.com/opendilab/DI-engine/tree/dev-league-demo) continually.

from di-engine.

PaParaZz1 avatar PaParaZz1 commented on July 21, 2024

I have tested pass two groups game environments, and they both approximately converged to NE.

The first one: zero-sum

Screen Shot 2021-07-30 at 8 51 31 PM

Screen Shot 2021-07-30 at 8 51 12 PM

The second one: prisoner dilemma

Screen Shot 2021-07-30 at 9 18 50 PM

Screen Shot 2021-07-30 at 9 18 39 PM

from di-engine.

PaParaZz1 avatar PaParaZz1 commented on July 21, 2024

This issue has been solved in #12

from di-engine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.