datamllab / rlcard Goto Github PK

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.

License: MIT License

Python 100.00%

reinforcement-learning deep-reinforcement-learning game-ai poker card-game poker-game openai-gym gym-environment card-games blackjack

rlcard's People

Contributors

Stargazers

Watchers

Forkers

daochenzha jacketme yli96 splendor-kill cywangwhu todd-ty jhy1993 davidsirui xianjinzha wangxuejie9527 creatorcen xingxinyu andrewnc xzhichen btbujiangjun dapenggg awesome-archive leo-xxx trendingtechnology wangduan023 vsenxx inlog-eneko downseq stavskal dongdongbai deepalchemist xiasd dreadlord1984 interestaiprj chengzhongnan maxisbest davincios ktang2k arain-sh zsf1975 collector-m gsj2019 lhenry15 qingquansong czaoth viwii hsvgbkhgbv automatje myonetaps luoxiang11 zamberjo wwxfromtju haipiyixia huangshizhi xmgfx sddi sinanh kylinliu w604935856 casssini dolotech scottunderhill beknown-j lujunsincerely xiangnanyue unonth studiobnd alexmutad 154461013 algosenses cnhup stzwooju isoundy000 junyuguo q275212 alanzhu39 luoxz-ai 1978mountain pokerpros kswongjx ifeela thisispoker socketstack noahj08 cheng-yuhan res260 emilmirzayev liu6023952 zhengsx niccolosacchi davidh2019 hanntonkin ikukang carmark clarit7 nacoyang mengpi nikl888 fancyerii zrachess a69e michael-z mukund-v tongzhupku adrianp-

rlcard's Issues

How to access the last game, please! Thank you!

Implement best response evaluation

Implement best response with step and step_back

How to compute exploitability for No-limit Texas Hold'em

Action Space setting of No-Limit Texas Holdem is Unreasonable

When I set self.init_chips=2000 (1000bbs) in game.py, the code runs extremly slow. I found out it's because of the action space setting. The num of leagl actions is too large(from 2 to 2000).

Expansion into Rummy

Rummy is still very poplar in Europe and America, it might be a good idea to include one of these variants into your system. A side note: Rummy is "the western version of Mahjong".

Baseline Algorithms for Limit Hold'em

Limit Hold'em: escabeche, SmooCT, Hyperborean

Baseline Algorithms for Limit Hold'em

Limit Hold'em: escabeche, SmooCT, Hyperborean

What's the difference between InfoSet Number and Avg.InfoSet Size? How could I calculate them?

For example, th InfoSet Number of Dou Dizhu is 10^53 ~10^83, the Avg.InfoSet Size is 10^23, how to explain them and calculate them? Thanks a lot :)

There is a typo in docs

http://rlcard.org/games.html#mahjong
'It is commonly played but 4 players. ' should be 'It is commonly played by 4 players. '

Normalize the state

LaoTie,clould you give an example of No-Limit Texas Holdem to help me understand how the "Normalizer" works?

How to save DQNAgent and NFSPAgent?

Hi, I want to ask how to save and load the DQNAgent and NFSPAgent such that we can reuse it once the training is finished.

Thanks!

Setup.py Could not find a version that satisfies the requirement tensorflow<2.0,>=1.14

When running the install script given in the readme, it produces the following error:
Could not find a version that satisfies the requirement tensorflow<2.0,>=1.14

<Wrap models. You need to inherit the Model class in rlcard/models.model.py. Then put all the models for the players into a list. Rewrite get_agent function and return this list.> can not find the get_agent function

<Load the model in environment. To load the model, modify load_pretrained_models in the corresponding game environment in rlcard/envs. Use the resgistered name to load the model.>
can not find load_pretrained_models function

key error: 34445555

It seems that '33334444' is legal for four_two_pair type, and '3333444555' is legal for trio_pair_chain_2 type, but '34445555' is illegal for trio_solo_chain_2 type. is it a bug？

State Encoding of Uno

I am confused by the state encoding of Uno. According to the documentation, the default state is encoded into 7 feature planes with each plane having a one-hot encoding of all possible cards. Planes 0 to 2 represent the player's hand, as seen in the example below. However, Plane 0 is just the inverse of Plane 1 and Plane 2 is always all zeros. The same pattern is repeated for Planes 4 to 6. Is there any reason for this?

State example obtained during an Uno game.

[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 0 1 1 1 1 1 1]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]]

Code for new game Gin Rummy

I have finished code for the card game Gin Rummy. How do I submit it if that is ok with you?

Note that the DQN training of it was very poor (essentially nothing learned). I have an option to specify an extremely simple version where the actions are essentially just discarding cards and the player scores 1 if there are no kings or queens in the hand else scores 0. This got to an average reward of 0.7 half-way through the training, but then fell down to 0.2 and stayed there.

I am not sure that I am using the training methods correctly. I just modified how Mahjong did DQN learning.

GUI for Gin Rummy program

I am working on a gui interface for my Gin Rummy program.

Is it ok with you for me to submit it?

There are two parts. Do you want the smaller part submitted first or both parts submitted at once?

The first part is a simple gui program with 8 python files. It does not interface with the rlcard environment. It has a menu bar, a preferences window, an about window, and the main window with 52 cards laid out in a 4 by 13 grid. A card can be clicked on and its name is printed in the console. A card can be right clicked or shift tapped and it flips over.

The second part has 22 python files. It interfaces with the rlcard environment of gin rummy.

rule-based AI for Dou Dizhu can guidance the agent train with astrict actions?

rule-based AI for Dou Dizhu maybe help the agent train with reasonable actions ? improve rate of convergence?

why no baccarat?

how to make a baccarat game with the rlcard framework?

Expansion into other Shedding games

Here are some games that are similar to Dou Dizhu and Uno that may be interesting

Zheng Shangyou https://en.wikipedia.org/wiki/Winner_(card_game)
Japanese Daifugō https://en.wikipedia.org/wiki/Daifug%C5%8D
Russian Durak https://en.wikipedia.org/wiki/Durak
Baltic Müller Matz https://en.wikipedia.org/wiki/M%C3%BCller_Matz
German Hund https://en.wikipedia.org/wiki/Hund_(card_game)
Persian Pasur https://en.wikipedia.org/wiki/Pasur_(card_game)
English Switch https://en.wikipedia.org/wiki/Switch_(card_game)
International Cheat https://en.wikipedia.org/wiki/Cheat_(game)

DQNAgent net is too simple?

i notice that DQNAgent->_build_model has only two fc net , is it too simple?, and can not get better performance? why not use conv net ?

How could i use deepcfr agent in games likes doudizhu?

You have implemented deep_cfr algorithm in your code, but there is not an example for it.

Can I add my own rule-based model for Dou dizhu？

Can I used the Agents as below for game Dou dizhu?

First, add my own Rule based model Agent, then

agent_CFR = cfr_gent()
agent_RuleBased = MyRuleAgent()
anent_NFSP = nfsp_agent()
env = rlcard.make('doudizhu')
env.set_agents([agent_CFR, agent_RuleBased, anent_NFSP])

Typing error in class Normalizer, method append, calculating std

Line 270 in rlcard/rlcard/agents/dqn_agent.py seems to have a typing error:

Line 270 in class Normalizer, method append is:

self.std = np.mean(self.state_memory, axis=0)

Should np.mean be np.std ?

老乡，您好。请问怎么获得斗地主的训练数据？

老乡，您好。请问怎么获得斗地主的训练数据？能否发布一个简单demo，实现斗地主人机对战。非常感谢！

How to push a single file from my dev repo to main dev repo ?

I want to push rlcard/tests/games/test_gin_rummy_games.py from my GitHub repo to the main dev repo. When I try to do that it seems that I have only the option to push all my changes. However, I just want to push this single file (which you requested that I do).

Right now, my GitHub repo has a lot of files that I did not intend to commit from my local repo. I am still learning git. I would think you don't care what is in my GitHub repo except for the pushes that I request. I have now incomplete versions of files that I am working on locally that got committed to my GitHub repo and which shouldn't be pushed to the dev repo.

How can i encode my action in doudizhu?

For example, as a human player, I choose 5553. What is my encoded action in [0, 308]?

Is there an example about subgame solving

cfr for doudizhu: TypeError: 'NoneType' object is not iterable

File "/rlcard/agents/cfr_agent.py", line 72, in traverse_tree
utility = self.traverse_tree(new_probs, player_id)
File "/rlcard/rlcard/agents/cfr_agent.py", line 71, in traverse_tree
self.env.step(action)
File "/rlcard/rlcard/envs/env.py", line 62, in step
next_state, player_id = self.game.step(self.decode_action(action))
File "/rlcard/envs/doudizhu.py", line 94, in decode_action
for legal_action in legal_actions:
TypeError: 'NoneType' object is not iterable

Tree-based wrapper for environment for game tree traversal

PyPI Release?

Hey, could you please release this library on PyPI, so people can just do pip install rlcard instead of having to clone the repo first? It makes it easier to use your code.

Specifically, I'm going to release an RL library on PyPI soon using various RL environment libraries. I'd like it use rlcard in addition to others, but to depend on rlcard it would have to either be included with my package (which is undesirable) or be installed from pip per a requirements.txt file (and thus hosted on PyPI).

nfsp_agent samples best-response instead of average policy

It looks like nfsp_agent samples the best-response network in evaluation mode. I copied this behavior in the PyTorch implementation. However, Theorem 7 in [1] argues that it is the average strategy profile that converges to a Nash equillibrium. Sampling the best-response network produces a deterministic pure strategy, while the average policy network produces a stochastic behavioural strategy. This is discussed in Section 4.2 of [2]. Also, it looks like DeepMind's implementation [3] samples the average policy network in evaluation mode.

Am I missing something?

References:
[1] Heinrich et al. (2015) "Fictitious Self-Play in Extensive-Form Games"
[2] Heinrich and Silver (2016) "Deep Reinforcement Learning from Self-Play in Imperfect Information Games"
[3] Lanctot et al. (2019) "OpenSpiel: A Framework for Reinforcement Learning in Games"

Questions regarding Mattel Games

Why is Uno in the list when Skip-Bo https://en.wikipedia.org/wiki/Skip-Bo and Phase 10 https://en.wikipedia.org/wiki/Phase_10 isn't, when in fact they are popular as well?

how can i save the cfr model

save the total game tree?
if do that, how to load it.
and the same problem as deep cfr

torch or pytorch at setup.py

Facing issue when trying to install torch, my workaround is to uncomment torch at setup.py

I saw this error when running pip install -e .

ERROR: Could not find a version that satisfies the requirement torch>=1.3 (from rlcard==0.1.6) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.3 (from rlcard==0.1.6)

blackjack env works fine without torch.

Setup Versions:
conda 4.7.11
Python 3.7.5

module 'torch.nn' has no attribute 'Flatten'

agent = dqn_agent_pytorch.DQNAgent("dqn {}".format(i), action_num=env.action_num, state_shape=env.state_shape, mlp_layers=[128,128])

When trying to initialize some dqn pytorch agents I am getting the above error.
Am I doing sth. wrong here? And how can I solve the issue?

Edit:
In general are there some guides on how to build my own games and how to use pytorch for training?

Will keras version be added in the future ?

Questions regarding Gong Zhu and Sheng ji

Would it be a good idea to include Gong Zhu https://en.wikipedia.org/wiki/Gong_Zhu and Sheng Ji https://en.wikipedia.org/wiki/Sheng_ji into this, as it is popular among Chinese communities (and both are trick-taking games)?

Uno is not learning anything?

Hey there,

I installed rlcard via pip install rlcard when trying the example uno.py I had to do some modifications as the code installed with pip was not the most recent one. I got the example to run after small modifications (getting most recent code samples).

Question
However after training a long training I still have a very very small reward:

timestep | 3939973
reward | 0.004

timestep | 3945224
reward | 0.036

timestep | 3949951
reward | -0.04

What parameters for training do you use?
How long do you train?
What do I miss?
Does anyone here has different results?

I did not change any params:



with tf.Session() as sess:

    # Initialize a global step
    global_step = tf.Variable(0, name='global_step', trainable=False)

    # Set up the agents
    agent = DQNAgent(sess,
                     scope='dqn',
                     action_num=env.action_num,
                     replay_memory_size=20000,
                     replay_memory_init_size=memory_init_size,
                     state_shape=env.state_shape,
                     mlp_layers=[512, 512])
    random_agent = RandomAgent(action_num=eval_env.action_num)
    env.set_agents([agent, random_agent, random_agent, random_agent])
    eval_env.set_agents([agent, random_agent, random_agent, random_agent])

When will a stable version of Gin Rummy be released?

Title stands: when can I use Gin Rummy? There's crashes and issues with the reward computation in it's current state.

i can not find the model save code

when i want to find how to save the agent model ,i can not find the model save code,but the pretrained model leduc_holdem_nfsp exsit.
saver = tf.train.Saver(tf.model_variables())
saver.restore(self.sess, tf.train.latest_checkpoint(check_point_path))
so where is saver.save ?

Implement smaller versions of games

Human-sized games could be too complex for the algorithms. We will implement smaller versions of the games like Dou Dizhu, Majong, UNO, to make it feasible for research. Thanks for the feedback from the anonymous reviewers.

how to get the perfect information?

Hi rlcard team, awesome work!

I'd like to know could I get perfect information for the game and how? e.g., can I get all cards information of three players in Doudizhu?

Thanks!

confusing example

how to use the pretrained model

how to use the pretrained model, such as nfsp agents. i want to play doudizhu with 3 players, all of them load the pretrained model

doudizhu determine landlord

in function determine_role of doudizhu game，you choose the index 0 as landlord default, why not add a action named "determine landlord" ,so we can train to decide which player should be the landlord. i just confused

How can i play doudizhu with AI

Doudizhu needs three-person, how can I play with two ai in this game?

bad performance

########## Evaluation ##########
Timestep: 629402 Average reward is 0.458

########## Evaluation ##########
Timestep: 1258220 Average reward is 0.46

########## Evaluation ##########
Timestep: 1888626 Average reward is 0.514

########## Evaluation ##########
Timestep: 2516620 Average reward is 0.506

########## Evaluation ##########
Timestep: 3144764 Average reward is 0.492

########## Evaluation ##########
Timestep: 3774566 Average reward is 0.468

########## Evaluation ##########
Timestep: 4402996 Average reward is 0.422

this is my doudizhu_nfsp_result log, the more trained , the worse resullt, why?

pip packaging issue

Your actual environments only depend on numpy and matplotlib. When you install rlcard via pip, it's because you're using the environments as part of a larger thing (in my case as a dependency of a package I'm going to release), not because you want to reproduce experiments with sample code.

The specific problem I have is that, as previously mentioned, I'm releasing a large library that depends on rlcard. Having that library in turn depend on tensorflow, tensorflow probability and sonnet is undesirable for me, as it will be for many people who'd like to use rlcard environments (the main use case), especially since you restrict TF to 1.14 or 1.15.

Can you remove those as requirements of RLCard in the PyPI release? Per the above, I think that removing the demo code from the PyPI release or having people install an additional appropriate version of tensorflow etc would be what you'd normally do in a situation like this.

datamllab / rlcard Goto Github PK

rlcard's People

Contributors

Stargazers

Watchers

Forkers

rlcard's Issues

State example obtained during an Uno game.

timestep | 3945224 reward | 0.036

Recommend Projects

Recommend Topics

Recommend Org

timestep | 3945224
reward | 0.036