Code Monkey home page Code Monkey logo

rlcard's People

Contributors

adrianp- avatar adrianpgob avatar andrewnc avatar aypee19 avatar benblack769 avatar billh0420 avatar caoyuanpu avatar clarit7 avatar clarivy avatar daochenzha avatar hsywhu avatar ismael-elatifi avatar jeremy-feng avatar jkterry1 avatar junyuguo avatar kaanozdogru avatar kaiks avatar kingyiusuen avatar lhenry15 avatar mia1996 avatar michael1015198808 avatar mjudell avatar rishabhvarshney14 avatar rodrigodelazcano avatar ruzhwei avatar rxng8 avatar saerdna avatar xixo99 avatar zhengsx avatar zhigal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rlcard's Issues

Expansion into Rummy

Normalize the state

LaoTie,clould you give an example of No-Limit Texas Holdem to help me understand how the "Normalizer" works?

adding-models.md

<Wrap models. You need to inherit the Model class in rlcard/models.model.py. Then put all the models for the players into a list. Rewrite get_agent function and return this list.> can not find the get_agent function

<Load the model in environment. To load the model, modify load_pretrained_models in the corresponding game environment in rlcard/envs. Use the resgistered name to load the model.>
can not find load_pretrained_models function

key error: 34445555

It seems that '33334444' is legal for four_two_pair type, and '3333444555' is legal for trio_pair_chain_2 type, but '34445555' is illegal for trio_solo_chain_2 type. is it a bug?

State Encoding of Uno

I am confused by the state encoding of Uno. According to the documentation, the default state is encoded into 7 feature planes with each plane having a one-hot encoding of all possible cards. Planes 0 to 2 represent the player's hand, as seen in the example below. However, Plane 0 is just the inverse of Plane 1 and Plane 2 is always all zeros. The same pattern is repeated for Planes 4 to 6. Is there any reason for this?

State example obtained during an Uno game.
[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 0 1 1 1 1 1 1]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]]

Code for new game Gin Rummy

I have finished code for the card game Gin Rummy. How do I submit it if that is ok with you?

Note that the DQN training of it was very poor (essentially nothing learned). I have an option to specify an extremely simple version where the actions are essentially just discarding cards and the player scores 1 if there are no kings or queens in the hand else scores 0. This got to an average reward of 0.7 half-way through the training, but then fell down to 0.2 and stayed there.

I am not sure that I am using the training methods correctly. I just modified how Mahjong did DQN learning.

GUI for Gin Rummy program

I am working on a gui interface for my Gin Rummy program.

Is it ok with you for me to submit it?

There are two parts. Do you want the smaller part submitted first or both parts submitted at once?

The first part is a simple gui program with 8 python files. It does not interface with the rlcard environment. It has a menu bar, a preferences window, an about window, and the main window with 52 cards laid out in a 4 by 13 grid. A card can be clicked on and its name is printed in the console. A card can be right clicked or shift tapped and it flips over.

The second part has 22 python files. It interfaces with the rlcard environment of gin rummy.

Expansion into other Shedding games

DQNAgent net is too simple?

i notice that DQNAgent->_build_model has only two fc net , is it too simple?, and can not get better performance? why not use conv net ?

Can I add my own rule-based model for Dou dizhu?

Can I used the Agents as below for game Dou dizhu?

First, add my own Rule based model Agent, then

agent_CFR = cfr_gent()
agent_RuleBased = MyRuleAgent()
anent_NFSP = nfsp_agent()
env = rlcard.make('doudizhu')
env.set_agents([agent_CFR, agent_RuleBased, anent_NFSP])

How to push a single file from my dev repo to main dev repo ?

I want to push rlcard/tests/games/test_gin_rummy_games.py from my GitHub repo to the main dev repo. When I try to do that it seems that I have only the option to push all my changes. However, I just want to push this single file (which you requested that I do).

Right now, my GitHub repo has a lot of files that I did not intend to commit from my local repo. I am still learning git. I would think you don't care what is in my GitHub repo except for the pushes that I request. I have now incomplete versions of files that I am working on locally that got committed to my GitHub repo and which shouldn't be pushed to the dev repo.

cfr for doudizhu: TypeError: 'NoneType' object is not iterable

File "/rlcard/agents/cfr_agent.py", line 72, in traverse_tree
utility = self.traverse_tree(new_probs, player_id)
File "/rlcard/rlcard/agents/cfr_agent.py", line 71, in traverse_tree
self.env.step(action)
File "/rlcard/rlcard/envs/env.py", line 62, in step
next_state, player_id = self.game.step(self.decode_action(action))
File "/rlcard/envs/doudizhu.py", line 94, in decode_action
for legal_action in legal_actions:
TypeError: 'NoneType' object is not iterable

PyPI Release?

Hey, could you please release this library on PyPI, so people can just do pip install rlcard instead of having to clone the repo first? It makes it easier to use your code.

Specifically, I'm going to release an RL library on PyPI soon using various RL environment libraries. I'd like it use rlcard in addition to others, but to depend on rlcard it would have to either be included with my package (which is undesirable) or be installed from pip per a requirements.txt file (and thus hosted on PyPI).

nfsp_agent samples best-response instead of average policy

It looks like nfsp_agent samples the best-response network in evaluation mode. I copied this behavior in the PyTorch implementation. However, Theorem 7 in [1] argues that it is the average strategy profile that converges to a Nash equillibrium. Sampling the best-response network produces a deterministic pure strategy, while the average policy network produces a stochastic behavioural strategy. This is discussed in Section 4.2 of [2]. Also, it looks like DeepMind's implementation [3] samples the average policy network in evaluation mode.

Am I missing something?

References:
[1] Heinrich et al. (2015) "Fictitious Self-Play in Extensive-Form Games"
[2] Heinrich and Silver (2016) "Deep Reinforcement Learning from Self-Play in Imperfect Information Games"
[3] Lanctot et al. (2019) "OpenSpiel: A Framework for Reinforcement Learning in Games"

torch or pytorch at setup.py

Facing issue when trying to install torch, my workaround is to uncomment torch at setup.py

I saw this error when running pip install -e .

ERROR: Could not find a version that satisfies the requirement torch>=1.3 (from rlcard==0.1.6) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.3 (from rlcard==0.1.6)

blackjack env works fine without torch.

Setup Versions:
conda 4.7.11
Python 3.7.5

module 'torch.nn' has no attribute 'Flatten'

agent = dqn_agent_pytorch.DQNAgent("dqn {}".format(i), action_num=env.action_num, state_shape=env.state_shape, mlp_layers=[128,128])

When trying to initialize some dqn pytorch agents I am getting the above error.
Am I doing sth. wrong here? And how can I solve the issue?

Edit:
In general are there some guides on how to build my own games and how to use pytorch for training?

Uno is not learning anything?

Hey there,

I installed rlcard via pip install rlcard when trying the example uno.py I had to do some modifications as the code installed with pip was not the most recent one. I got the example to run after small modifications (getting most recent code samples).

Question
However after training a long training I still have a very very small reward:

timestep | 3939973
reward | 0.004

timestep | 3945224
reward | 0.036

timestep | 3949951
reward | -0.04

What parameters for training do you use?
How long do you train?
What do I miss?
Does anyone here has different results?

I did not change any params:



with tf.Session() as sess:

    # Initialize a global step
    global_step = tf.Variable(0, name='global_step', trainable=False)

    # Set up the agents
    agent = DQNAgent(sess,
                     scope='dqn',
                     action_num=env.action_num,
                     replay_memory_size=20000,
                     replay_memory_init_size=memory_init_size,
                     state_shape=env.state_shape,
                     mlp_layers=[512, 512])
    random_agent = RandomAgent(action_num=eval_env.action_num)
    env.set_agents([agent, random_agent, random_agent, random_agent])
    eval_env.set_agents([agent, random_agent, random_agent, random_agent])

i can not find the model save code

when i want to find how to save the agent model ,i can not find the model save code,but the pretrained model leduc_holdem_nfsp exsit.
saver = tf.train.Saver(tf.model_variables())
saver.restore(self.sess, tf.train.latest_checkpoint(check_point_path))
so where is saver.save ?

Implement smaller versions of games

Human-sized games could be too complex for the algorithms. We will implement smaller versions of the games like Dou Dizhu, Majong, UNO, to make it feasible for research. Thanks for the feedback from the anonymous reviewers.

how to get the perfect information?

Hi rlcard team, awesome work!

I'd like to know could I get perfect information for the game and how? e.g., can I get all cards information of three players in Doudizhu?

Thanks!

how to use the pretrained model

how to use the pretrained model, such as nfsp agents. i want to play doudizhu with 3 players, all of them load the pretrained model

doudizhu determine landlord

in function determine_role of doudizhu game,you choose the index 0 as landlord default, why not add a action named "determine landlord" ,so we can train to decide which player should be the landlord. i just confused

bad performance

########## Evaluation ##########
Timestep: 629402 Average reward is 0.458

########## Evaluation ##########
Timestep: 1258220 Average reward is 0.46

########## Evaluation ##########
Timestep: 1888626 Average reward is 0.514

########## Evaluation ##########
Timestep: 2516620 Average reward is 0.506

########## Evaluation ##########
Timestep: 3144764 Average reward is 0.492

########## Evaluation ##########
Timestep: 3774566 Average reward is 0.468

########## Evaluation ##########
Timestep: 4402996 Average reward is 0.422

this is my doudizhu_nfsp_result log, the more trained , the worse resullt, why?

pip packaging issue

Your actual environments only depend on numpy and matplotlib. When you install rlcard via pip, it's because you're using the environments as part of a larger thing (in my case as a dependency of a package I'm going to release), not because you want to reproduce experiments with sample code.

The specific problem I have is that, as previously mentioned, I'm releasing a large library that depends on rlcard. Having that library in turn depend on tensorflow, tensorflow probability and sonnet is undesirable for me, as it will be for many people who'd like to use rlcard environments (the main use case), especially since you restrict TF to 1.14 or 1.15.

Can you remove those as requirements of RLCard in the PyPI release? Per the above, I think that removing the demo code from the PyPI release or having people install an additional appropriate version of tensorflow etc would be what you'd normally do in a situation like this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.