google-deepmind / hanabi-learning-environment Goto Github PK

hanabi_learning_environment is a research platform for Hanabi experiments.

License: Apache License 2.0

CMake 0.34% Python 56.36% Shell 0.32% C++ 40.06% C 2.92%

hanabi-learning-environment's Introduction

This is not an officially supported Google product.

hanabi_learning_environment is a research platform for Hanabi experiments. The file rl_env.py provides an RL environment using an API similar to OpenAI Gym. A lower level game interface is provided in pyhanabi.py for non-RL methods like Monte Carlo tree search.

Getting started

Install the learning environment:

sudo apt-get install g++            # if you don't already have a CXX compiler
sudo apt-get install python-pip     # if you don't already have pip
pip install .                       # or pip install git+repo_url to install directly from github

Run the examples:

pip install numpy                   # game_example.py uses numpy
python examples/rl_env_example.py   # Runs RL episodes
python examples/game_example.py     # Plays a game using the lower level interface

hanabi-learning-environment's People

Contributors

Stargazers

Watchers

Forkers

jaedukseo satyam-cyc parety bjxingch gdcollect gyunt shankar0206 findmyway gaurangit tushargupta01 bmat06 keithofaptos yayab dar-mehta elphasminyato harrybraviner b-marks rmelick themltrader ml-lab zhoudaqing sjoerdapp harshita-kaushal vsalova sabeekams ksajan awlego orenbochman mikewlange sejoon86 jwarley heidekrueger 8bit-pixies jpatrickpark adamlerer besth madhuparna04 dbftdiyoeywga gauraviitkgp elkhrt kejingjing88212 baconwaffle logicdown lorenzom1997 ljarendse antu3199 karhohs tlwillke miguelperalvo jz44 hengyuan-hu fpli-mbr rocanaan rvy11 carlbalmer raj925 lgtm70b moorugi98 wangyuehy djkeyes greglucas abandonrules etarakci-hvl dpkpathak penniepeng321 robertdigital mariobonse hanabi-game-project igormolybog gergogomori farama-foundation sandguine blavad daniellsm kevinjeon morphomenelauce ankitshah009 vaibhavram 98devin stjordanis codeaudit muskanmahajan37 devcat1010 sevity mriedman andrewtanjs a71uuy saganatt theomat abishekk vivekbits2210 saikrishna-1996 zzzachos tylerjamesmalloy bramgrooten thijsie2 0xjchen mohitahuja1 fangbq classicvalues

hanabi-learning-environment's Issues

Score at final state might be calculated wrong

The environment score seems to be 0 in the last state when it loses the game by running out of lives, this causes the reward to be -score if it loses, when it wins this doesn't seem to happen and the score stays the value of the sum of the fireworks list.

It might be an error as the score in the last state should be the sum of the total fireworks placed regardless of if it wins or not.

I've attached the code I used to test this and one of the outputs

Error in Pip install

When I try pip install ., I run into

'setuptools.build_meta' has no attribute '__legacy__'

Following some recommendations online, I also tried pip install . --no-use-pep517, however, to no avail, since that leads to ImportError: No module named 'skbuild'.

Full installation error output:

Processing /home/aalok/code/hanabi/hanabi-learning-environment
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /home/aalok/.local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmphjwdxubr
       cwd: /tmp/pip-req-build-6lvo9jzu
  Complete output (10 lines):
  Traceback (most recent call last):
    File "/home/aalok/.local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 257, in <module>
      main()
    File "/home/aalok/.local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/aalok/.local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 85, in get_requires_for_build_wheel
      backend = _build_backend()
    File "/home/aalok/.local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py", line 76, in _build_backend
      obj = getattr(obj, path_part)
  AttributeError: module 'setuptools.build_meta' has no attribute '__legacy__'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 /home/aalok/.local/lib/python3.5/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmphjwdxubr Check the logs for full command output.

Error running /usr/bin/ranlib: vfork

I got the following error when 'make'

$ make
Scanning dependencies of target hanabi
[ 7%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_card.cc.o
[ 14%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_game.cc.o
[ 21%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_hand.cc.o
[ 28%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_history_item.cc.o
[ 35%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_move.cc.o
[ 42%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_observation.cc.o
[ 50%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/hanabi_state.cc.o
[ 57%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/util.cc.o
[ 64%] Building CXX object hanabi_lib/CMakeFiles/hanabi.dir/canonical_encoders.cc.o
[ 71%] Linking CXX static library libhanabi.a
Error running /usr/bin/ranlib: vfork
hanabi_lib/CMakeFiles/hanabi.dir/build.make:302: recipe for target 'hanabi_lib/libhanabi.a' failed
make[2]: *** [hanabi_lib/libhanabi.a] Error 1
make[2]: *** Deleting file 'hanabi_lib/libhanabi.a'
CMakeFiles/Makefile2:159: recipe for target 'hanabi_lib/CMakeFiles/hanabi.dir/all' failed
make[1]: *** [hanabi_lib/CMakeFiles/hanabi.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Mac OS - lib failing to load

Hello,

I am trying to run the environment on a macOS Mojave (version 10.14) and when trying to run rl_env_example.py I get

'NoneType' object has no attribute 'NewGame'

when trying to run lib.NewGame, meaning lib probably wasn't loaded.

Similarly, when running game_examply.py, I get AssertionError: lib failed to load on line 122. The instructions on the readme don't specify differences between OS, so I'd appreciate some help on this issue.

Thank you!

module 'tensorflow' has no attribute 'contrib'

I'm trying to run the script hanabi-learning-environment-master/hanabi_learning_environment/agents/rainbow/run_experiment.py but I have an error that says "AttributeError: module 'tensorflow' has no attribute 'contrib'", (I already have tensorflow 2.4.1 and all the necessary modules). I also found this stackoverflow page that addresses this problem, but couldn't upgrade the tensorflow 1.x code to tensorflow 2.x (the same AttributeError occurs when I run train.py because it also imports dqn.py that uses the attribute config of tensorflow which I guess, is no longer available on tensorflow versions 2.X)

[Feature Request] Start game from specific state

Hi!

We are a group of researchers in TU Berlin, and we are currently working on MARL using Hanabi as a testbed. There are some ideas that we would like to try, but this will require to be able to initialize the environment from a specific state that we choose and not from a new game state. Ideally that would include arranging the cards on the deck as well. For a subset of the things that we would like to try, just being able to save and load a state will be enough. Do you consider adding one of these features, so that we can use tree search methods?

Question: Why no observation stacking?

In the Paper you state that you did not use any observation stacking outside the previous action by the player.

What is the reasoning for this? Did you try out configurations with stacking and they did not perform well?

Another possibility would have been to include the other players action since the current players last action. Was something like this considered?

windows OS pip install failed error C2039: "accumulate": not a member of "std"

C:\Users\happy\AppData\Local\Temp\pip-req-build-5pbxa9vm\hanabi_learning_environment\hanabi_lib\canonical_encoders.cc(30): error C2039: "accumulate": 不是 "std" 的成\x98
D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.31.31103\include\unordered_map(24): note: 参\xa7\x80std”的声明
C:\Users\happy\AppData\Local\Temp\pip-req-build-5pbxa9vm\hanabi_learning_environment\hanabi_lib\canonical_encoders.cc(30): error C3861: “accumulate\x9d: 找不到标▒\xac\xa6
[6/13] Building CXX object hanabi_learning_environment\hanabi_lib\CMakeFiles\hanabi.dir\hanabi_hand.cc.obj

HanabiState.copy() truncates fireworks

I ran this straightforward script:

from hanabi_learning_environment.pyhanabi import HanabiGame
g = HanabiGame()
s1 = g.new_initial_state()
print(s1.fireworks())
s2 = s1.copy()
print(s2.fireworks())

And I got this output:
[0, 0, 0, 0, 0]
[0, 0, 0]

This means HanabiState.copy() is not copying the fireworks correctly. This issue presents itself in two different ways depending on the number of colors C in the game.

If C > 3, then HanabiState.copy() truncates the fireworks to a length of 3.
If C <= 3, then coping the state and calling s2.fireworks() throws the following error:

terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is C) >= this->size() (which is C)
Aborted (core dumped)

Where C is again the number of colors in the game parameters.

pip install error

Hi, when I try to install env, this error show. How can I fix it?

Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 280, in <module> main() File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py", line 205, in build_wheel metadata_directory) File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 192, in build_wheel self.run_setup() File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 234, in run_setup self).run_setup(setup_script=setup_script) File "/usr/local/lib/python2.7/dist-packages/setuptools/build_meta.py", line 141, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 9, in <module> install_requires=['cffi'] File "/tmp/pip-build-env-OA8BOj/overlay/lib/python2.7/site-packages/skbuild/setuptools_wrap.py", line 693, in setup return upstream_setup(*args, **kw) File "/usr/local/lib/python2.7/dist-packages/setuptools/__init__.py", line 145, in setup return distutils.core.setup(**attrs) File "/usr/lib/python2.7/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/tmp/pip-build-env-OA8BOj/overlay/lib/python2.7/site-packages/skbuild/command/bdist_wheel.py", line 41, in run super(bdist_wheel, self).run(*args, **kwargs) File "/tmp/pip-build-env-OA8BOj/overlay/lib/python2.7/site-packages/wheel/bdist_wheel.py", line 353, in run self.write_wheelfile(distinfo_dir) File "/tmp/pip-build-env-OA8BOj/overlay/lib/python2.7/site-packages/skbuild/command/bdist_wheel.py", line 78, in write_wheelfile super(bdist_wheel, self).write_wheelfile(wheelfile_base, generator) File "/tmp/pip-build-env-OA8BOj/overlay/lib/python2.7/site-packages/wheel/bdist_wheel.py", line 393, in write_wheelfile BytesGenerator(buffer, maxheaderlen=0).flatten(msg) File "/usr/lib/python2.7/email/generator.py", line 83, in flatten self._write(msg) File "/usr/lib/python2.7/email/generator.py", line 115, in _write self._write_headers(msg) File "/usr/lib/python2.7/email/generator.py", line 145, in _write_headers print >> self._fp, v TypeError: 'unicode' does not have the buffer interface ---------------------------------------- ERROR: Failed building wheel for hanabi-learning-environment

PyPI Release

Could this be released on PyPI so other libraries can depend on it more easily?

rl_env.step() exits the program if illegal move is passed instead of throwing an exception

If an illegal move is supplied to rl_env.step() the underlying c library just exits with a message to stderr:

Input requirements failed at ../../../hanabi_learning_environment/hanabi_lib/hanabi_state.cc:222 in ApplyMove: MoveIsLegal(move)
Aborted (core dumped)

It would be nice if rl_env.step() would raise a ValueError instead.

Error when trying to access certain card_knowledge functions

Hi!
I'm trying to read one attribute of the card knowledge but I get stuck when I call the line

observation.card_knowledge()[player][card_id].color_plausible(0):

where player and card_id are integers. I get the following error.

  File "/home/lorenzo/hanabi-learning-environment/pyhanabi.py", line 241, in color_plausible
    return lib.ColorIsPlausible(self._knowledge, color_index)
  File "/home/lorenzo/.local/lib/python2.7/site-packages/cffi/api.py", line 908, in __getattr__
    make_accessor(name)
  File "/home/lorenzo/.local/lib/python2.7/site-packages/cffi/api.py", line 903, in make_accessor
    raise AttributeError(name)
AttributeError: ColorIsPlausible

If I try to print just the variable I get

observation.card_knowledge()[player][card_id]
output: XX | RYGWB12345

This applies to the color_plausible and rank_plausible functions.

I am not very familiar with cffi but I was able to trace the problem back to cffi source code (api.py), where the aforementioned functions were not present in the accessors variable during update_accessor method nor the library.dict in the make_accessor method. I suspect that this issue is being caused by cffi not being pointed to the correct functions to bind, but I am too inexperienced with cffi to confirm this.

Please let me know if there is any additional information I can provide to assist in solving this issue.
Thank you,
Daniel

ACHA agent?

Hi, I'm just curious that why ACHA agent is not included in this repo?

Ad-Hoc Play

Issues running agents/rainbow/run_experiment.py (import fails and agent does not improve)

Hello,

I am trying to train the sample Rainbow agent by running the run_experiment.py script at hanabi-learning-environment/agents/rainbow, and I am having 2 issues:

The script doesn't run due to finding no module named rl_env. The problem seems to be that rl_env.py is in the root directory of the project, whereas the script I am trying to run is two level below it. I temporarily fixed it by adding Hanabi-learning-envirionment to my PYTHONPAH bash variable, but I believe a better fix might be to move the run_experiment script to the root (and change the necessary imports), which could be considered for a future version.
After temporarily fixing the issue above, running the script for around 15 hours (without changing any of the default parameters) showed no improvement of the agent (final Average per episode return: 0.22). Is this the expected result?

Thank you!

Error compiling

I tried to compile following the README, however I got the following error:

-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is unknown
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
CMake Error at CMakeLists.txt:2 (project):
  No CMAKE_CXX_COMPILER could be found.

Installing g++ using this sudo apt-get install g++ resolved the issue.
May be adding it to the README would be good ?

Coding style

Hi,

Thanks for a great repo. Can you provide coding style used in this project ?

API documentation

Hi!

I couldn't find documentation for the API. Does it exist?

Also, the README.md states the API is similar to OpenAI Gym. Would someone please share why OpenAI Gym was not sufficient for this project?

Thanks!

checkpointer._clean_up_old_checkpoints not working as intended/documented?

Hello,

The documentation of checkpoint.py states that the checkpointer will delete all but the last CHECKPOINT_DURATION checkpoint files. In the example, it says this would apply both to cpkt files and sentinel_checkpoint_complete files.

However, in my experiments this is not the case. It removes the old tf_ckpt* files, but not any of the others (including sentinel_checkpoint_complete and ckpt, but also actions_ckpt, add_count_ckpt, invalid_range_ckpt, legal_actions_ckpt etc).

Is this intended behavior? The comments at the start of the could seem to imply that it should remove all older files, but in practice only one of the many types of files is being removed, which doesn't that much space (which seems to be the reason for implementing the clean up in the first place).

If this is not intended, I can add the fix to a pull request.

UnicodeDecodeError when attempting to load checkpoint

Hello,

I have been trying to load checkpoints from a previous run of the Rainbow agent, and getting a UnicodeDecodeError.

Usage:

First, run a few rounds of training by calling, from /agents/rainbow :
python run train.py --base_dir ../../log
This successfully trains the agents for the number of iterations on run_experiments and creates a log directory at root.
After the training finishes, attempt to run the same command again (in my understanding, this is the correct usage for loading checkpoints, as train.py calls initialize_checkpointer, which should either load a checkpoint if it exists or create a new one)

Error message:

/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
['/Users/rodrigocanaan/Dev/hanabi-learning-environment/agents/rainbow', '/Users/rodrigocanaan/Dev/hanabi-learning-environment', '/Users/rodrigocanaan/anaconda3/lib/python36.zip', '/Users/rodrigocanaan/anaconda3/lib/python3.6', '/Users/rodrigocanaan/anaconda3/lib/python3.6/lib-dynload', '/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages', '/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/aeosa', '/Users/rodrigocanaan/Dev/spawningtool/src/spawningtool']
I0301 18:12:50.248378 4649448896 tf_logging.py:115] Creating RainbowAgent agent with the following parameters:
I0301 18:12:50.248818 4649448896 tf_logging.py:115] 	 gamma: 0.990000
I0301 18:12:50.249113 4649448896 tf_logging.py:115] 	 update_horizon: 1.000000
I0301 18:12:50.249352 4649448896 tf_logging.py:115] 	 min_replay_history: 500
I0301 18:12:50.249456 4649448896 tf_logging.py:115] 	 update_period: 4
I0301 18:12:50.249592 4649448896 tf_logging.py:115] 	 target_update_period: 500
I0301 18:12:50.249689 4649448896 tf_logging.py:115] 	 epsilon_train: 0.000000
I0301 18:12:50.249766 4649448896 tf_logging.py:115] 	 epsilon_eval: 0.000000
I0301 18:12:50.249860 4649448896 tf_logging.py:115] 	 epsilon_decay_period: 1000
I0301 18:12:50.249946 4649448896 tf_logging.py:115] 	 tf_device: /gpu:*
I0301 18:12:50.250036 4649448896 tf_logging.py:115] 	 use_staging: True
I0301 18:12:50.250484 4649448896 tf_logging.py:115] 	 optimizer: <tensorflow.python.training.rmsprop.RMSPropOptimizer object at 0x182d615710>
W0301 18:12:50.647318 4649448896 tf_logging.py:125] From /Users/rodrigocanaan/Dev/hanabi-learning-environment/agents/rainbow/rainbow_agent.py:232: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

2019-03-01 18:12:50.953091: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
I0301 18:12:51.199140 4649448896 tf_logging.py:115] 	 learning_rate: 0.000025
I0301 18:12:51.199301 4649448896 tf_logging.py:115] 	 optimizer_epsilon: 0.000031
Traceback (most recent call last):
  File "train.py", line 107, in <module>
    app.run(main)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "train.py", line 104, in main
    launch_experiment()
  File "train.py", line 89, in launch_experiment
    FLAGS.checkpoint_file_prefix))
  File "/Users/rodrigocanaan/Dev/hanabi-learning-environment/agents/rainbow/run_experiment.py", line 220, in initialize_checkpointing
    checkpoint_dir, latest_checkpoint_version, dqn_dictionary):
  File "/Users/rodrigocanaan/Dev/hanabi-learning-environment/agents/rainbow/dqn_agent.py", line 519, in unbundle
    self._replay.load(checkpoint_dir, iteration_number)
  File "/Users/rodrigocanaan/Dev/hanabi-learning-environment/agents/rainbow/replay_memory.py", line 582, in load
    self.memory.load(checkpoint_dir, suffix)
  File "/Users/rodrigocanaan/Dev/hanabi-learning-environment/agents/rainbow/replay_memory.py", line 420, in load
    self.__dict__[attr] = np.load(infile, allow_pickle=False)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 423, in load
    magic = fid.read(N)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/gzip.py", line 406, in _read_gzip_header
    magic = self._fp.read(2)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/gzip.py", line 91, in read
    self.file.read(size-self._length+read)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 132, in read
    pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 100, in _prepare_value
    return compat.as_str_any(val)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 107, in as_str_any
    return as_str(value)
  File "/Users/rodrigocanaan/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 80, in as_text
    return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Thank you,
Rodrigo Canaan

Are there any scripts available to easily reproduce the results of the paper?

Support to multiple GPU training?

Is there an easy config setting/change to attribute several GPU's to training?
Thank you

Defining Non-canonical (Float) Observations

Hi, thanks for the great project! I noticed that if we were using float element in the observation vector, we should use custom objects.
A snippet of cpp end likes this:

char* EncodeObservation(pyhanabi_observation_encoder_t* encoder,
                        pyhanabi_observation_t* observation) {
  REQUIRE(encoder != nullptr);
  REQUIRE(encoder->encoder != nullptr);
  REQUIRE(observation != nullptr);
  REQUIRE(observation->observation != nullptr);
  auto obs_enc = reinterpret_cast<hanabi_learning_env::ObservationEncoder*>(
      encoder->encoder);
  auto obs = reinterpret_cast<hanabi_learning_env::HanabiObservation*>(
      observation->observation);
  std::vector<int> encoding = obs_enc->Encode(*obs);
  std::string obs_str = "";
  for (int i = 0; i < encoding.size(); i++) {
    obs_str += (encoding[i] ? "1" : "0");
    if (i != encoding.size() - 1) {
      obs_str += ",";
    }
  }
  return strdup(obs_str.c_str());
}

And the python-end likes this:

  def encode(self, observation):
    """Encode the observation as a sequence of bits."""
    c_encoding_str = lib.EncodeObservation(self._encoder,
                                           observation.observation())
    encoding_string = encode_ffi_string(c_encoding_str)
    lib.DeleteString(c_encoding_str)
    # Canonical observations are bit strings, so it is ok to encode using a
    # string. For float or double observations, make a custom object
    encoding = [int(x) for x in encoding_string.split(",")]
    return encoding

I understand the current implementation only deals with int observation elements. They are first convert to "01"strings and then decoded in python with cffi. As for ``float, I tried to replace obs_str += (encoding[i] ? "1" : "0");` with `obs_str += std::to_string(encoding[i] )` (assuming the contents of `encoding` are floats). But what python-end decoded are not floats. I wonder if there are any examples demonstrating how to deal with float observations?

Confusion on Observation Bit-Strings

Hi,

I have recently been reading multiple research papers and projects that use the HLE (including the original HLE paper) but can't seem to find any documentation on what the bit-string representation of the observation space is. I've contacted a few of the authors of the aforementioned projects and they too don't seem to know what the bit-string observation space is - which is crazy to me since the HLE has played an important role in recent RL advances.

The closest to documentation I have found is in PettingZoo (https://www.pettingzoo.ml/classic/hanabi), but even their documentation is wrong when you inspect the observations. For example, there are times when the Fireworks indices have 1s in them at the start of the game. I've looked through the codebase myself and tried to decipher the CanonicalObservationEncoder in canonical_encoders.cc and the HanabiObservation in hanabi_observation.cc but keep getting different results each test I run.

Can you explain the bit-string representation of the observation space or point me to where I can find one?

Thanks

Interoperable game record format / export / import

At some point, we'll want this learning environment to be able to inter-operate with other systems, e.g. online environments where it can find human players and other bots, exchange game records and the like.

One such online Hanabi-playing environment is https://github.com/Zamiell/hanabi-live

I've put up a PR noting how to export game records in its format and parse them at Hanabi-Live/hanabi-live#663

Have others defined or thought about game record formats? I've got several ideas based on parsing hanabi-live's json Game format which is documented at
https://github.com/Zamiell/hanabi-live/blob/master/misc/example_game_with_comments.json

observation stacking?

we're working to reproduce some of the results in the original paper. It is stated that the rainbow agent: "is feedforward and does not use any observation stacking outside of the last action, which is included in the current observation".

However, in the code the rainbow agent appears to stack the last 4 observations by default. Empirically (at least in early iterations) this doesn't seem to affect cumulative return much either way. Could someone clarify if obs stacking was used for the results in the paper?

Visual studio 2019 is not recognised

Hello, I have anaconda an VS2019 installed on my windows machine. Unfortunately I can not install hanabi-learning-environment, it gives the following reason

Building wheel for hanabi-learning-environment (PEP 517) ... |
..
..

Trying "Visual Studio 15 2017 Win64 v141" generator - failure
...
..
ERROR: Failed building wheel for hanabi-learning-environment
Failed to build hanabi-learning-environment
ERROR: Could not build wheels for hanabi-learning-environment which use PEP 517 and cannot be installed directly

Why is he doing so?

[pyhanabi.py] HanabiState.copy() does not return correct fireworks

Whenever playing a game with 5 different colors, the copied state using copy() from the class HanabiState() returns a list of length 3 when using fireworks(). However, when the state is printed, fireworks with 5 different colors are correctly printed.

google-deepmind / hanabi-learning-environment Goto Github PK

hanabi-learning-environment's Introduction

Getting started

hanabi-learning-environment's People

Contributors

Stargazers

Watchers

Forkers

hanabi-learning-environment's Issues

Recommend Projects

Recommend Topics

Recommend Org