Comments (10)
@zxzzz0 You can ask our WeChat assistant for the group entry. Assistant WeChat ID: OpenDILab
from di-engine.
Hi all,
Nice project. We want to start use it. After reading the doc and the config
dizoo/competitive_rl/entry/cpong_dqn_default_config.py
for league train, there are still something not clear to us. Do you have a channel that can discuss trivial questions frequently? Like a WeChat group or slack channel?
Thank you for your participation. And you can add our WeChat assistant ID listed above. Our slack channel will also be available soon. To be specific, what kinds of training tasks you want to do? I guess you want to train a self-play league in cpong environment, and dizoo/competitive_rl/entry/cpong_dqn_default_config.py
is just a agent VS bot training config. If you are interested in league training, we can discuss further.
from di-engine.
@PaParaZz1 This is the exact config file mentioned in the League Overview doc, where it said it's a demo of league.
If it's just an agent VS bot training config, it's not a correct config for self-play with league training. Please correct this config.
We are setting up the league training. Look forward to seeing the slack channel.
from di-engine.
@PaParaZz1 This is the exact config file mentioned in the League Overview doc, where it said it's a demo of league.
If it's just an agent VS bot training config, it's not a correct config for self-play with league training. Please correct this config.
We are setting up the league training. Look forward to seeing the slack channel.
Sorry I misunderstood something. cpong_dqn_config is actually the demo config for league training. Our agent VS bot config is not upload in this version. And which parts of league training is not clear enough, such as how to define own player and apply it?
from di-engine.
Yes. We don't understand how to apply it.
To help other people understand this and hide unnecessary details, we wrote a simple two-player game environment below for demo purpose.
class Game:
def reset(self):
return [[0, 1], [1,0]] # trivial observation
def step(step, actions):
if actions == [0, 0]:
rewards = -10, -10
elif actions == [0, 1]:
rewards = -1, +1
elif actions == [1, 0]:
rewards = +1, -1
elif actions == [1, 1]:
rewards = 0, 0
observations = [[0, 1], [1,0]]
dones = True, True
infos = None, None
return observations, rewards, dones, infos
This is a simple multi agent environment for 2 players. The game will be played for just 1 step and finish. Each player will pick one action at this step, action 0
or action 1
and they will receive rewards accordingly. Human can easily calculate the mixed strategy Nash equilibria result for this game. (select action 0
with prob 1/10
and action 1
with prob 9/10
).
After reading the League Overview doc we are still not clear which files should be changed.
Code is worth a thousand words.
@PaParaZz1 Could you create a branch for this simple game as a demo for using opendilab's league training for custom environment? After you have the branch ready we will run the code and see if it can converge to the Nash equilibria mentioned above.
cc: @liuyuisanai for more inputs!
from di-engine.
We further added following 2 simple rule-based agents for evaluating the main agent during self-play.
class Agent1:
def step(obs):
return 0 # always return fixed action 0
class Agent2:
def step(obs):
# return actions uniformly
return np.random.choice([0, 1], p=[0.5, 0.5])
Agent1
always pick one fixed action. (action 0
here). Agent2
randomly pick one action from the action set but the probability is not Nash equilibria.
If league training is going well, we expect to see the win rate between main agent against these two rule-based agents increase nearly monotonically. Please also include these two agents in your demo branch as evaluation is really an important part league training. We won't know that the main agent is indeed becoming stronger without a good set of metrics.
Feel free to let us if there are any questions. @PaParaZz1
from di-engine.
@zxzzz0 https://join.slack.com/t/opendilab/shared_invite/zt-sqsd142v-N_l~EHLPYF1jr4c9PkqDuA You can join our slack team and discuss further
from di-engine.
Anyone who interested in league training can pay attention to this branch(https://github.com/opendilab/DI-engine/tree/dev-league-demo) continually.
from di-engine.
I have tested pass two groups game environments, and they both approximately converged to NE.
The first one: zero-sum
The second one: prisoner dilemma
from di-engine.
This issue has been solved in #12
from di-engine.
Related Issues (20)
- Question about MBPO. HOT 5
- Document about config HOT 6
- wandb crash HOT 1
- Env manager is closed after wandb logger is set HOT 2
- `enable_save_figure()` has no detailed document, and its comment is wrong. HOT 4
- Does model support multiple input parameters? HOT 1
- evaluate on same env for multiple time after fails HOT 4
- `GAIL` the algorithm performs much worse than from its original paper. HOT 6
- Setting of Parameters in gym_anytrading and How to Evaluate Training Performance HOT 3
- Halfcheetah has negative state in z position HOT 2
- Error occurs when async_mode=True in example/ppo.py HOT 3
- Error occurs when running in parallel HOT 2
- NameNotFound: Environment stocks doesn't exist. HOT 11
- not compatible with gym_pybullet_drones env HOT 5
- how to use logger HOT 2
- why no tanh after sample action in ppo HOT 4
- entry_points() got an unexpected keyword argument 'group' HOT 1
- ram usage increase overtime HOT 1
- Trading deploy - issues when trying to process a single window HOT 5
- how to separate training environments and evaluation environments HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from di-engine.