werner-duvaud / muzero-general Goto Github PK
View Code? Open in Web Editor NEWMuZero
Home Page: https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
License: MIT License
MuZero
Home Page: https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
License: MIT License
If for example you want to implement a chess game and give a reward of 1 to the winner and -1 to the looser at the end of the Game.
In the current implementation this wouldn't really work, cause you only know that a game is finished when somebody wins and then its not possible to give the looser his reward (in hindsight), unless you would iterate over all players with a dummy observation and then give them their rewards. So without this only the last person to play gets a reward.
I would propose to change the AbstractGame wrapper to include a get_reward method which returns a reward for each player and then assigns the last time that player acted that reward. This would only happen at the end of the game and shouldn't effect gym like games which give a reward for each step, i think.
But i'm not sure about this so i thought i make an issue to get feedback before actually changing it and making a pull request.
For any new games which I could add, is there any min/max range for the reward to be followed? For the gym Atari games, I am not aware of any clipping.
Specifically
`def step(self, action):
observation, reward, done = self.env.step(action) #<--does this reward need to be in some range?
return observation, reward, done
`
Thanks!
When I run a freshly cloned version of this program to train connect4, all graphs in the Total_reward-section of tensorboard become flat lines (Total reward=10, Mean value=0, Episode length=11, MuZero reward=0, Opponent reward=0). The graphs in all of the other sections look ok I think. I get this result both locally and in Google colab.
It worked before 91afb1d so maybe something broke there?
Hi,
I would like to ask if tensorflow will be supported also as per the initial README (d838835#diff-04c6e90faac2675aa89e2176d2eec7d8).
Thanks
Line 97 in 0918977
There is a bug in the connect 4 game logic, but it's small enough that I'm sure it hasn't affected training. If a player (parity says it must be player 2) wins on the last move of the game (move 42), it will be rewarded as a draw.
From connect4.py: reward = 1 if done and 0 < len(self.legal_actions()) else 0
No legal actions doesn't mean someone didn't just win.
Example:
a = self.Game()
mv = [2,1,1,1,3,1,4,1,6,5,7,2,3,4,5,6,7,7,6,5,4,3,2,7,6,5,4,3,2,2,3,4,5,6,7,7,6,5,4,3,2,1]
for m in mv:
x = a.step(m-1)[1:]
print(x)
a.render()
Last step prints (0, True) even though the second player won. https://connect4.gamesolver.org/?pos=211131416572345677654327654322345677654321
Hi,
I trained cart pole and titactoe. I used tensorboard to analysis result then found that learning rate for the cart pole changed over the training process, but the learning rate for titactoe learning remain unchanged during all steps. Could you tell me why the learning remain unchanged during all steps? Thanks
Is there a slack, discord, discourse or other form for communicating about this project? I'd love to help contribute to this project but my skills aren't very good in ML. I was wondering if I could assist with a CI/CD system or help with creating a grid search for hyperparameter searching.
Here
Line 370 in 18cad41
you assign value 0 to nodes that haven't yet been visited. However, since values are normalized to the range [0,1] a better estimate to assign to unexplored nodes would be 0.5. Otherwise it seems that the exploration-exploitation trade-off will lean towards exploitation (after the first arbitrary choice, that branch will likely have the maximum value in min_max_stats, and therefore have value 1, particularly when values and rewards are positive).
Hi Werner, I've really enjoyed tinkering with the codebase as I learn all aspects of MuZero. I see in the MuZero paper they describe how they mask the policy logits to allowable moves in the root node of the MCTS:
AlphaZero used the set of legal actions obtained from the simulator to mask the prior
produced by the network everywhere in the search tree. MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.
Do you think the MCTS should mask out illegal moves if it is at the root node (if the game being played supports it)? It might speed up the learning process for these types of games. If so, do you want me to send you a pull request for it?
At this line (https://github.com/werner-duvaud/muzero-general/blob/master/models.py#L125), it should be
next_encoded_state - min_next_encoded_state
Hi,
Thanks for making this, it looks really nice to use! I'm curious: are there any extra modifications I need to make to my game file for competitive play between two agents?
Cheers,
Miles
I have a question about the calculation of ucb_score
.
In particular value_score
is calculated as:
value_score = min_max_stats.normalize(
child.reward + self.config.discount * child.value()
)
In two player games like Tic-Tac-Toe the players alternate turns. The value of a given state for me is the exact negative of the value of a given state for my opponent. I see this is represented in backpropagate()
:
Line 350 in b94cd65
My question is: Doesn't this information need to be taken into account when calculating ucb_score
? That is, don't we have to negate the value of child.value()
since the child is a state from the perspective of our opponent?
For that matter, I see expand
always uses the unmodified reward
from recurrent_inference
but backpropagate
negates the value
depending on the value of virtual_to_play
?
Can you help me better understand how this works in muzero?
How would I go about using this for a graph based game like risk?
Hi, just trying the code out. Selecting [2] Connect4, then [0] train, it logs this error in the terminal after a few seconds
Any ideas on how to avoid this error? Reduce batch size? Tensorboard seems to continue to run the error though, although terminal output is stuck on Last test reward: 10.00. Training step: 0/100000. Played games: 27. Loss: 0.00
, with only the played games
number changing.
2020-07-13 18:07:45,104.ERROR worker.py:987 -- Possible unhandled error from worker: ray::Trainer.continuous_update_weights() (pid=100153, ip=192.168.0.2)
File "python/ray/_raylet.pyx", line 446, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 400, in ray._raylet.execute_task.function_executor
File "/home/user/muzero-general/trainer.py", line 66, in continuous_update_weights
) = self.update_weights(batch)
File "/home/user/muzero-general/trainer.py", line 138, in update_weights
observation_batch
File "/home/user/muzero-general/models.py", line 541, in initial_inference
policy_logits, value = self.prediction(encoded_state)
File "/home/user/muzero-general/models.py", line 461, in prediction
policy, value = self.prediction_network(encoded_state)
File "/home/user/anaconda3/envs/muzero/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/muzero-general/models.py", line 370, in forward
out = block(out)
File "/home/user/anaconda3/envs/muzero/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/muzero-general/models.py", line 204, in forward
out = self.conv1(x)
File "/home/user/anaconda3/envs/muzero/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/anaconda3/envs/muzero/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 353, in forward
return self._conv_forward(input, self.weight)
File "/home/user/anaconda3/envs/muzero/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 350, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.82 GiB total capacity; 191.73 MiB already allocated; 19.31 MiB free; 210.00 MiB reserved in total by PyTorch)
https://github.com/tensortrade-org/tensortrade
see this here
@notadamking what do you think?
Maybe this line should move to before line 47?
I have changed some params:
### Game
self.observation_shape = 6 * 7 # Dimensions of the game observation
self.action_space = [i for i in range(7)] # Fixed list of all possible actions
self.players = [i for i in range(2)] # List of players
self.stacked_observations = 2 # Number of previous observation to add to the current observation
### Self-Play
# 自我播放以供重播缓冲区使用的并发线程数
# Number of simultaneous threads self-playing to feed the replay buffer
self.num_actors = 8
# 如果游戏未完成之前的最大移动次数
# Maximum number of moves if game is not finished before
self.max_moves = 50
# 自我模拟的未来动作数
# Number of futur moves self-simulated
self.num_simulations = 30
# 奖励按时间顺序折扣
# Chronological discount of the reward
self.discount = 0.997
# 每次玩游戏后需要等待的秒数,以调整自我玩耍/训练比率,以免过度/不足
# Number of seconds to wait after each played game to adjust the self play / training ratio to avoid over/underfitting
self.self_play_delay = 0
# Root prior exploration noise
self.root_dirichlet_alpha = 0.25
self.root_exploration_fraction = 0.25
# UCB formula
self.pb_c_base = 19652
self.pb_c_init = 1.25
### Network
self.encoding_size = 32
self.hidden_layers = [64]
# 价值和奖励按比例缩放(几乎为sqrt)并编码在范围为-support_size到support_size的向量上
# Value and reward are scaled (with almost sqrt) and encoded on a vector with a range of -support_size to support_size
self.support_size = 10
### Training
# 存储模型权重的路径
# Path to store the model weights
self.results_path = "./pretrained"
# 培训步骤总数(即,权重根据批次进行更新)
# Total number of training steps (ie weights update according to a batch)
self.training_steps = 1000000
# 每个训练步骤要训练的游戏零件数
# Number of parts of games to train on at each training step
self.batch_size = 512
# 每个批次元素要保留的游戏移动次数
# Number of game moves to keep for every batch element
self.num_unroll_steps = 10
# 在使用模型进行自弹奏之前的训练步骤数
# Number of training steps before using the model for sef-playing
self.checkpoint_interval = 10
# 保留在重播缓冲区中的自玩游戏数
# Number of self-play games to keep in the replay buffer
self.window_size = 1000
# 计算目标值时要考虑的未来步骤数
# Number of steps in the futur to take into account for calculating the target value
self.td_steps = 10
# 每次训练后需要等待的秒数,以调整自我发挥/训练比率,以避免过度/不足
# Number of seconds to wait after each training to adjust the self play / training ratio to avoid over/underfitting
self.training_delay = 0
# Train on GPU if available
self.training_device = "cuda" if torch.cuda.is_available() else "cpu"
self.weight_decay = 1e-4 # L2 weights regularization
self.momentum = 0.9
# Exponential learning rate schedule
self.lr_init = 0.05 # Initial learning rate
self.lr_decay_rate = 1
self.lr_decay_steps = 10000
### Test
self.test_episodes = 2 # Number of game played to evaluate the network
error in training. maybe because I install gnome-tewaks-tool in training. Error is after i install this software.
this is i play with muzero (allways muzero first):
My environment is:
Ubuntu 18.04
i7-7700 16G
RTX 2080 8G
how to train a smart muzero for connect4?
Hi! Could you provide a reference for the average pooling layer inserted in the residual networks after the tower and before the value, reward and policy heads? Can't seem to find any sign of it in the papers...
I am running your code with the game connect4. It is already doing more than 250k steps. but the reward value is declining and approaching the bottom.
Hi there,
Great work on the repository.
Keen to try running on Google Colab, but run into an error on muzero.py.
Anyone had any luck getting to run on Google Colab?
Thanks!
Welcome to MuZero! Here's a list of games:
0. cartpole
1. connect4
2. gomoku
3. lunarlander
4. tictactoe
Enter a number to choose the game: 0
0. Train
1. Load pretrained model
2. Render some self play games
3. Play against MuZero
4. Exit
Enter a number to choose an action: 0
2020-03-26 19:58:46,830 INFO resource_spec.py:212 -- Starting Ray with 6.64 GiB memory available for workers and up to 3.34 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-03-26 19:58:47,246 INFO services.py:1123 -- View the Ray dashboard at localhost:8265
2020-03-26 19:58:50,509 WARNING worker.py:1072 -- The dashboard on node 2fdf91b82296 failed with the following error:
Traceback (most recent call last):
File "/usr/lib/python3.6/asyncio/base_events.py", line 1062, in create_server
sock.bind(sa)
OSError: [Errno 99] Cannot assign requested address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/dashboard.py", line 920, in <module>
dashboard.run()
File "/usr/local/lib/python3.6/dist-packages/ray/dashboard/dashboard.py", line 368, in run
aiohttp.web.run_app(self.app, host=self.host, port=self.port)
File "/usr/local/lib/python3.6/dist-packages/aiohttp/web.py", line 433, in run_app
reuse_port=reuse_port))
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/dist-packages/aiohttp/web.py", line 359, in _run_app
await site.start()
File "/usr/local/lib/python3.6/dist-packages/aiohttp/web_runner.py", line 104, in start
reuse_port=self._reuse_port)
File "/usr/lib/python3.6/asyncio/base_events.py", line 1066, in create_server
% (sa, err.strerror.lower()))
OSError: [Errno 99] error while attempting to bind on address ('::1', 8265, 0, 0): cannot assign requested address
Training...
Run tensorboard --logdir ./results and go to http://localhost:6006/ to see in real time the training performance.
2020-03-26 19:58:52,250 WARNING worker.py:1072 -- Failed to unpickle actor class 'Trainer' for actor ID 45b95b1c0100. Traceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/ray/function_manager.py", line 494, in _load_actor_class_from_gcs
actor_class = pickle.loads(pickled_class)
File "/usr/local/lib/python3.6/dist-packages/ray/cloudpickle/cloudpickle.py", line 1136, in subimport
__import__(name)
ModuleNotFoundError: No module named 'models'
(pid=630) 2020-03-26 19:58:52,240 ERROR function_manager.py:496 -- Failed to load actor class Trainer.
(pid=630) Traceback (most recent call last):
(pid=630) File "/usr/local/lib/python3.6/dist-packages/ray/function_manager.py", line 494, in _load_actor_class_from_gcs
(pid=630) actor_class = pickle.loads(pickled_class)
(pid=630) File "/usr/local/lib/python3.6/dist-packages/ray/cloudpickle/cloudpickle.py", line 1136, in subimport
(pid=630) __import__(name)
(pid=630) ModuleNotFoundError: No module named 'models'
2020-03-26 19:58:52,733 WARNING worker.py:1072 -- WARNING: 6 PYTHON workers have been started. This could be a result of using a large number of actors, or it could be a consequence of using nested tasks (see https://github.com/ray-project/ray/issues/3644) for some a discussion of workarounds.
---------------------------------------------------------------------------
RayTaskError(ModuleNotFoundError) Traceback (most recent call last)
<ipython-input-6-b332aadb407c> in <module>()
226 choice = int(choice)
227 if choice == 0:
--> 228 muzero.train()
229 elif choice == 1:
230 path = input("Enter a path to the model.weights: ")
1 frames
/usr/local/lib/python3.6/dist-packages/ray/worker.py in get(object_ids, timeout)
1500 worker.core_worker.dump_object_store_memory_usage()
1501 if isinstance(value, RayTaskError):
-> 1502 raise value.as_instanceof_cause()
1503 else:
1504 raise value
RayTaskError(ModuleNotFoundError): ray::IDLE (pid=631, ip=172.28.0.2)
File "python/ray/_raylet.pyx", line 430, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 433, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 434, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 306, in ray._raylet.deserialize_args
File "/usr/local/lib/python3.6/dist-packages/ray/serialization.py", line 323, in deserialize_objects
self._deserialize_object(data, metadata, object_id))
File "/usr/local/lib/python3.6/dist-packages/ray/serialization.py", line 271, in _deserialize_object
return self._deserialize_pickle5_data(data)
File "/usr/local/lib/python3.6/dist-packages/ray/serialization.py", line 262, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named 'games'
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
Cheers for the wonderful code! I am using it for atari game. But the worker threads use too much memory resulting in out of memory error.
RayTaskError(RayOutOfMemoryError): ray::SharedStorage (pid=2986, ip=172.28.0.2)
File "python/ray/_raylet.pyx", line 431, in ray._raylet.execute_task
File "/usr/local/lib/python3.6/dist-packages/ray/memory_monitor.py", line 120, in raise_if_low_memory
self.error_threshold))
ray.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node 8ff9db8651eb is used (12.13 / 12.72 GB). The top 10 memory consumers are:
PID MEM COMMAND
3039 4.53GiB ray::SelfPlay.continuous_self_play()
3054 4.5GiB ray::SelfPlay.continuous_self_play()
2985 1.26GiB ray::Trainer.continuous_update_weights()
126 0.38GiB /usr/bin/python3 -m ipykernel_launcher -f /root/.local/share/jupyter/runtime/kernel-4404f6dc-8f78-4a
2986 0.14GiB ray::SharedStorage
3026 0.08GiB ray::ReplayBuffer
2966 0.08GiB /usr/bin/python3 -u /usr/local/lib/python3.6/dist-packages/ray/dashboard/dashboard.py --host=localho
26 0.07GiB /usr/bin/python2 /usr/local/bin/jupyter-notebook --ip="172.28.0.2" --port=9000 --FileContentsManager
2960 0.04GiB /usr/local/lib/python3.6/dist-packages/ray/core/src/ray/thirdparty/redis/src/redis-server *:44319
3048 0.04GiB ray::IDLE
In addition, up to 0.24 GiB of shared memory is currently being used by the Ray object store. You can set the object store size with the object_store_memory
parameter when starting Ray.
Hi,
I'm trying to write a Gomoku game with MuZero. I'm learning from Connect4 since it's also a two player game. However I noticed the following code:
def get_observation(self):
if self.player == 1:
return self.board
else:
return -self.board
Why are you returning the negative board for one of the player not letting them to learn from the same board?
Training the model for 10h (RTX6000) on Connect4.
Is it ok that only the policy loss goes down over time, while others go up? If I understand correctly, lowering the learning rate might help? What other hyperparameters would be useful? What would be a quicker way to select hyperparameters?
P.S. Another related question, is whether maybe there's some idea how long it would take to get somewhat good? The performance is not good at the moment. Maybe I should embed a performance test (say 20 matches against a normal algorithmic opponent such as negamax) every several epochs or so?
please support windows,thanks.
I see in
Line 321 in 633c658
Line 336 in 633c658
Line 436 in 633c658
I think the way you transform value/reward is a little mismatch with the original paper at this line (
Line 153 in fe791e8
From the referenced paper (https://arxiv.org/abs/1805.11593), the transformation function should be
So instead of
x = torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1 + 0.001 * x)
the correct formula should be
x = torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1) + .001 * x
Line 137 in ecca75c
Since you are predicting a distributions, the first reward should be scalar_to_support(0)
Per #26 (comment) and #19 (comment) the code only supports two-player games with strict alternating actions. What would it take to encode the "next player" for the available actions into the game state? I'm thinking of games with rules where the state of the board may require an action from a player out of turn order (e.g., in GIPF, making a line of 4 pieces compels the owner of those pieces to remove them from the board regardless of whose turn the line(s) appeared on). For such a case, the action space could encode "this action compels removal from the board", but it breaks down when the opponent needs to make that decision before their turn and having the current player "decide" for them is…not ideal.
The above mentioned issues were closed without comment as to why they were closed. Resolved? Discussion over? No links to the code and I don't see anything in the game class about supporting determining the next player.
For some reason the playback is really slow (even if I remove the "press enter to continue" prompt).
Also, I'm curious why you didn't implement Trainable interface from ray - it would have enabled a more standardized experience (e.g. ability to run with tune or manully, automatic checkpointing) etc.?
My lunar lander is not training that well yet. Did you manage to get it to work with at least 150-200 reward consistently (e.g. say 100 episodes without dipping below 150)? If so, how long did you train and on what hardware?
Welcome to MuZero! Here's a list of games:
0. breakout
1. cartpole
2. connect4
3. gomoku
4. lunarlander
5. tictactoe
Enter a number to choose the game: 3
0. Train
1. Load pretrained model
2. Render some self play games
3. Play against MuZero
4. Exit
Enter a number to choose an action: 0
2020-04-22 15:45:12,693 INFO resource_spec.py:212 -- Starting Ray with 8.01 GiB memory available for workers and up to 4.01 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-04-22 15:45:13,024 INFO services.py:1093 -- View the Ray dashboard at localhost:8265
Training...
Run tensorboard --logdir ./results and go to http://localhost:6006/ to see in real time the training performance.
(pid=3934) /home/zbf/Documents/muzero/muzero-general/replay_buffer.py:132: RuntimeWarning: invalid value encountered in true_divide
(pid=3934) game_probs /= numpy.sum(game_probs)
2020-04-22 15:49:02,252 ERROR worker.py:1003 -- Possible unhandled error from worker: ray::ReplayBuffer.get_batch() (pid=3934, ip=192.168.1.12)
File "python/ray/_raylet.pyx", line 643, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 623, in function_executor
File "/home/zbf/Documents/muzero/muzero-general/replay_buffer.py", line 74, in get_batch
game_id, game_history, game_prob = self.sample_game(self.buffer)
File "/home/zbf/Documents/muzero/muzero-general/replay_buffer.py", line 133, in sample_game
game_index = numpy.random.choice(len(self.buffer), p=game_probs)
File "mtrand.pyx", line 922, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
2020-04-22 15:49:02,252 ERROR worker.py:1003 -- Possible unhandled error from worker: ray::Trainer.continuous_update_weights() (pid=3932, ip=192.168.1.12)
File "python/ray/_raylet.pyx", line 643, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 623, in function_executor
File "/home/zbf/Documents/muzero/muzero-general/trainer.py", line 56, in continuous_update_weights
index_batch, batch = ray.get(replay_buffer.get_batch.remote(self.model.get_weights()))
ray.exceptions.RayTaskError(ValueError): ray::ReplayBuffer.get_batch() (pid=3934, ip=192.168.1.12)
File "python/ray/_raylet.pyx", line 643, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 623, in function_executor
File "/home/zbf/Documents/muzero/muzero-general/replay_buffer.py", line 74, in get_batch
game_id, game_history, game_prob = self.sample_game(self.buffer)
File "/home/zbf/Documents/muzero/muzero-general/replay_buffer.py", line 133, in sample_game
game_index = numpy.random.choice(len(self.buffer), p=game_probs)
File "mtrand.pyx", line 922, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
Last test reward: 0.00. Training step: 8/10. Played games: 29. Loss: 26.00
I do not change any code.
My environment is:
Ubuntu 18.04
i7-7700 16G
RTX 2080 8G
When I'm running the latest code I always get an error when exiting the process. Please see attached screen output below:
I did delete old repo and had a fresh check out, still seeing this.
so I did some research, found this: https://github.com/ray-project/ray/issues/5042
and this: https://github.com/ray-project/ray/issues/6239
Hope it can help you.
Welcome to MuZero! Here's a list of games:
0. cartpole
1. connect4
2. gomoku
3. lunarlander
Enter a number to choose the game: 2
0. Train
1. Load pretrained model
2. Render some self play games
3. Play against MuZero
4. Exit
Enter a number to choose an action: 0
2020-02-25 15:21:47,620 INFO resource_spec.py:212 -- Starting Ray with 3.91 GiB memory available for workers and up to 1.97 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-02-25 15:21:47,998 INFO services.py:1093 -- View the Ray dashboard at localhost:8265
Training...
Run tensorboard --logdir ./results and go to http://localhost:6006/ to see in real time the training performance.
Done test reward: 1.00. Training step: 11/10. Played games: 1. Loss: 33.26
0. Train
1. Load pretrained model
2. Render some self play games
3. Play against MuZero
4. Exit
Enter a number to choose an action: 4
Exception ignored in: <function ActorHandle.__del__ at 0x1115289e0>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/ray/actor.py", line 655, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x1115289e0>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/ray/actor.py", line 655, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x1115289e0>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/ray/actor.py", line 655, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x1115289e0>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/ray/actor.py", line 655, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x1115289e0>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/ray/actor.py", line 655, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x1115289e0>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/ray/actor.py", line 655, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Thank you very much for a comprehensive implementation.
I ran Breakout with the current configuration, except changing the actors from 350 to 4 since I ran
into memory problems with Ray. I am using the same setup as specified tested on, except GTX 1060.
The code reports one line, but never updates this line. On Tensorboard I see progress, but at zero rewards.
Any advice?
Is it possible to extract not only the best immediate action, but the best action sequence over the planning horizon?
So a vector of actions with length self.config.num_simulations
After the release of AlphaZero, I trained Gomoku based on AlphaZero, and I didn't need much calculation to get a good model.
The code used is the code of this.
I train a simple game(6x6 board and 4 chess in line will win) on cpu(i7 2.5G Hz). It only takes about 10 hours. And the code is single thread.
I train a simple game(8x8 board and 5 chess in line will win) on cpu(i7 2.5G Hz). It only takes about 50 hours. And the code is single thread.
In muzero, I copy games/gomoku.py
to games/gobang.py
and changed some params.
changes:
board size from 11 to 6;
num in line from 5 to 4;
num simulations change to 400;
training steps change to 2000;
batch size change to 512;
played games per training step ratio set to 1;
full code see gobang.py
I trained about 47 hours. The result is still very poor. Trained on i7 7700 3.6G Hz 16G, RTX 2080 8G. And muzero implementation Multi-thread.
What went wrong? Can you help me?
My final goal is a 15x15 chessboard, but I want to train a simple one first. if it works, and then train a complex one. After all, the complex requires a lot of calculation.
Forgive me for my poor English, some content comes from google translation.
I often get this error after training for a few hours. It has happened in all the games I've tried (but I've only tried two-player games). The error message below is from tictactoe. If this only happens to me, maybe it could have something to do with my low self played games per training step ratio. Training continues but self-play stops after the error.
2020-04-27 19:00:35,290.ERROR worker.py:1011 -- Possible unhandled error from worker: ray::ReplayBuffer.get_batch() (pid=10953, ip=192.168.0.113)
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/gustav/Desktop/muzero-general/replay_buffer.py", line 74, in get_batch
game_id, game_history, game_prob = self.sample_game(self.buffer)
File "/home/gustav/Desktop/muzero-general/replay_buffer.py", line 133, in sample_game
game_index = numpy.random.choice(len(self.buffer), p=game_probs)
File "mtrand.pyx", line 920, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
2020-04-27 19:00:35,290 ERROR worker.py:1011 -- Possible unhandled error from worker: ray::Trainer.continuous_update_weights() (pid=10952, ip=192.168.0.113)
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/gustav/Desktop/muzero-general/trainer.py", line 56, in continuous_update_weights
index_batch, batch = ray.get(replay_buffer.get_batch.remote(self.model.get_weights()))
ray.exceptions.RayTaskError(ValueError): ray::ReplayBuffer.get_batch() (pid=10953, ip=192.168.0.113)
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/gustav/Desktop/muzero-general/replay_buffer.py", line 74, in get_batch
game_id, game_history, game_prob = self.sample_game(self.buffer)
File "/home/gustav/Desktop/muzero-general/replay_buffer.py", line 133, in sample_game
game_index = numpy.random.choice(len(self.buffer), p=game_probs)
File "mtrand.pyx", line 920, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
Last test reward: 20.00. Training step: 87849/100000. Played games: 25959. Loss: nan
In
Line 384 in 98cb784
the value to be backpropagated along the search path incorporates rewards from all nodes with the same sign, which seems in contradiction with
Line 380 in 98cb784
and the way value targets are being computed in
muzero-general/replay_buffer.py
Line 206 in 98cb784
Is this intentional? Or maybe this line should be corrected to something like:
value = node.reward + self.config.discount * value if node.to_play == to_play else -node.reward + self.config.discount * value
After about 40mins (it varies) of lunar lander training I get this error:
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>1791. Loss: 589.988
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
Exception ignored in: <function ActorHandle.__del__ at 0x7fcaa87c17b8>
Traceback (most recent call last):
File "/home/andriy/miniconda3/envs/muzero/lib/python3.7/site-packages/ray/actor.py", line 652, in __del__
AttributeError: 'NoneType' object has no attribute 'get_global_worker'
This is not necessarily a bug, but as far as I can see in your code you neglected the hidden state gradient scaling in training the dynamic function as it is suggested by the Muzero paper. Here is a snippet of the paper psudocode:
for action in actions:
value, reward, policy_logits, hidden_state = network.recurrent_inference(hidden_state, action)
predictions.append((1.0 / len(actions), value, reward, policy_logits))
hidden_state = scale_gradient(hidden_state, 0.5)
Is there any specific reason why you ignored it? Actually, when I read the paper I why this scaling "ensures that the total gradient applied to the dynamics function stays constant" as the authors stated. I would be thankful if you share your insight about the effectiveness of this operation.
In
Line 262 in 283e353
This is also a limitation in the way actions are encoded: for example, my understanding is that castling in chess is encoded as two separate, consecutive moves made by the same player, but this would break the MCTS logic as it stands here. Any idea how this was handled by the original authors?
muzero-general/replay_buffer.py
Line 79 in 4d54162
Shouldn't this be:
position_probs = numpy.array(game_history.priorities[:-1]) / sum(game_history.priorities[:-1])
It makes no sense to learn from the last step because we would be training against illegal moves (and zero reward as per my last issue)
Hi, thanks a lot for this well-organized code.
I tried your code for CartPole and it works fine. However, I cannot get any results with other environments (e.g Breakout) especially those with an image as input. In these cases, it repeats taking the same actions during all the episodes. Do you have any suggestions where should I start tunning some hyper-parameters?
Hi,
I cloned your repo and tried to use it with a fresh environment. Unfortunately, it seems that it does not work properly, as I get the following error when trying to train or play against muzero:
(muzero) C:\Users\...\Desktop\RLUnity\python\muzero-general>python muzero.py
Welcome to MuZero! Here's a list of games:
0. breakout
1. cartpole
2. connect4
3. gomoku
4. gridworld
5. lunarlander
6. tictactoe
7. twentyone
Enter a number to choose the game: 0
0. Train
1. Load pretrained model
2. Diagnose model
3. Render some self play games
4. Play against MuZero
5. Test the game manually
6. Exit
Enter a number to choose an action: 4
Testing...
2020-07-03 13:48:15,208 INFO resource_spec.py:212 -- Starting Ray with 6.49 GiB memory available for workers and up to 3.26 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-03 13:48:15,637 INFO services.py:1165 -- View the Ray dashboard at �[1m�[32mlocalhost:8265�[39m�[22m
E0703 13:48:20.680589 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 1, num_retries = 10)
E0703 13:48:23.682262 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 2, num_retries = 10)
E0703 13:48:26.682521 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 3, num_retries = 10)
E0703 13:48:29.685930 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 4, num_retries = 10)
E0703 13:48:32.687705 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 5, num_retries = 10)
E0703 13:48:35.690719 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 6, num_retries = 10)
E0703 13:48:38.692248 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 7, num_retries = 10)
E0703 13:48:41.694942 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 8, num_retries = 10)
E0703 13:48:44.696983 13144 11520 raylet_client.cc:69] Retrying to connect to socket for pathname tcp://127.0.0.1:60437 (num_attempts = 9, num_retries = 10)
F0703 13:48:45.697768 13144 11520 raylet_client.cc:78] Could not connect to socket tcp://127.0.0.1:60437
*** Check failure stack trace: ***
@ 00007FFDB9633A8C public: __cdecl google::LogMessage::~LogMessage(void) __ptr64
@ 00007FFDB94A8954 public: virtual __cdecl google::NullStreamFatal::~NullStreamFatal(void) __ptr64
@ 00007FFDB94E351B public: void __cdecl google::NullStreamFatal::`vbase destructor'(void) __ptr64
@ 00007FFDB94E5B5E public: void __cdecl google::NullStreamFatal::`vbase destructor'(void) __ptr64
@ 00007FFDB93F3B98 public: class google::LogMessageVoidify & __ptr64 __cdecl google::LogMessageVoidify::operator=(class google::LogMessageVoidify const & __ptr64) __ptr64
@ 00007FFDB93F1C00 public: class google::LogMessageVoidify & __ptr64 __cdecl google::LogMessageVoidify::operator=(class google::LogMessageVoidify const & __ptr64) __ptr64
@ 00007FFDB93F00ED public: class google::LogMessageVoidify & __ptr64 __cdecl google::LogMessageVoidify::operator=(class google::LogMessageVoidify const & __ptr64) __ptr64
@ 00007FFDB93EF9C3 public: class google::LogMessageVoidify & __ptr64 __cdecl google::LogMessageVoidify::operator=(class google::LogMessageVoidify const & __ptr64) __ptr64
@ 00007FFDB936F179 public: virtual __cdecl google::LogSink::~LogSink(void) __ptr64
@ 00007FFDF395F9CF _PyObject_FastCallKeywords
@ 00007FFDF395F7DA _PyObject_FastCallKeywords
@ 00007FFDF3967939 _PyMethodDef_RawFastCallKeywords
@ 00007FFDF3968322 _PyEval_EvalFrameDefault
@ 00007FFDF3951286 _PyEval_EvalCodeWithName
@ 00007FFDF3967907 _PyMethodDef_RawFastCallKeywords
@ 00007FFDF3968A69 _PyEval_EvalFrameDefault
@ 00007FFDF3A523A3 _PyStack_UnpackDict
@ 00007FFDF39AB431 PyErr_NoMemory
@ 00007FFDF3968322 _PyEval_EvalFrameDefault
@ 00007FFDF3951286 _PyEval_EvalCodeWithName
@ 00007FFDF3967907 _PyMethodDef_RawFastCallKeywords
@ 00007FFDF3968A69 _PyEval_EvalFrameDefault
@ 00007FFDF3951286 _PyEval_EvalCodeWithName
@ 00007FFDF3932A93 PyEval_EvalCodeEx
@ 00007FFDF39329F1 PyEval_EvalCode
@ 00007FFDF393299B PyArena_Free
@ 00007FFDF3AC614D PyRun_FileExFlags
@ 00007FFDF3AC6974 PyRun_SimpleFileExFlags
@ 00007FFDF3AC601B PyRun_AnyFileExFlags
@ 00007FFDF3A11AAF _Py_UnixMain
@ 00007FFDF3A11B57 _Py_UnixMain
@ 00007FFDF3980D5A PyErr_NoMemory
Also I experienced an Error when trying to run breakout. I needed to install gym[atari] in addition to the requirements.
(muzero) C:\Users\...\Desktop\RLUnity\python\muzero-general>python muzero.py
Welcome to MuZero! Here's a list of games:
0. breakout
1. cartpole
2. connect4
3. gomoku
4. gridworld
5. lunarlander
6. tictactoe
7. twentyone
Enter a number to choose the game: 0
breakout is not a supported game name, try "cartpole" or refer to the documentation for adding a new game.
Traceback (most recent call last):
File "C:\Users\...\Desktop\RLUnity\python\muzero-general\games\breakout.py", line 11, in <module>
import cv2
ModuleNotFoundError: No module named 'cv2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "muzero.py", line 286, in <module>
muzero = MuZero(games[choice])
File "muzero.py", line 48, in __init__
raise err
File "muzero.py", line 39, in __init__
game_module = importlib.import_module("games." + self.game_name)
File "C:\Users\...\.conda\envs\muzero\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "C:\Users\...\Desktop\RLUnity\python\muzero-general\games\breakout.py", line 13, in <module>
raise ModuleNotFoundError('Please run "pip install gym[atari]"')
ModuleNotFoundError: Please run "pip install gym[atari]"
The diagram at the bottom of the page in How MuZero works is cut off.
Below all used dialogue from python muzero.py
I have trained a lunarlander model.
When load pretrained model, an error occurs:
RuntimeError: Error(s) in loading state_dict for MuZeroFullyConnectedNetwork: Missing key(s) in state_dict: "representation_network.0.weight", "representation_network.0.bias", "dynamics_encoded_state_network.0.weight", "dynamics_encoded_state_network.0.bias", "dynamics_encoded_state_network.2.weight", "dynamics_encoded_state_network.2.bias", "dynamics_reward_network.0.weight", "dynamics_reward_network.0.bias", "dynamics_reward_network.2.weight", "dynamics_reward_network.2.bias", "prediction_policy_network.0.weight", "prediction_policy_network.0.bias", "prediction_policy_network.2.weight", "prediction_policy_network.2.bias", "prediction_value_network.0.weight", "prediction_value_network.0.bias", "prediction_value_network.2.weight", "prediction_value_network.2.bias". Unexpected key(s) in state_dict: "representation_network.layers.0.weight", "representation_network.layers.0.bias", "dynamics_encoded_state_network.layers.0.weight", "dynamics_encoded_state_network.layers.0.bias", "dynamics_encoded_state_network.layers.2.weight", "dynamics_encoded_state_network.layers.2.bias", "dynamics_reward_network.layers.0.weight", "dynamics_reward_network.layers.0.bias", "dynamics_reward_network.layers.2.weight", "dynamics_reward_network.layers.2.bias", "prediction_policy_network.layers.0.weight", "prediction_policy_network.layers.0.bias", "prediction_value_network.layers.0.weight", "prediction_value_network.layers.0.bias", "prediction_value_network.layers.2.weight", "prediction_value_network.layers.2.bias".
how can i fix it? thx~
In Appendix G of the muzero paper, they define the priority of a sample as p_i = | nu_i - z_i |, and write "nu is the search value and z the observed n-step return." (I'll use "nu" in place of ν for clarity when comparing with v)
However, in self_play.py, line 335, you seem to calculate priority as | v_i - nu_i |.
There are three related but distinct quantities here:
v - the output of the value head for a position
nu (ν) - the value estimate returned by MCTS for a position
z - the bootstrapped value of a position calculated using the next k observed rewards and the nu value k steps in the future, and discounting appropriately.
See Section 3 of the paper for definitions of these three terms.
It seems to me that the code currently does not match the priority calculation in the paper. Is this intentional? The | v - nu | formulation in the code has the advantage that priorities can be quickly updated whenever a position is used in training, because a new v is obtained. The | nu - z | version in the paper is not amenable to priority updates, because that would seem to require at a minimum re-running MCTS to re-estimate nu, and at most re-playing the game for k round to re-estimate z, both of which are products of the current weights.
Hello!
Thanks for sharing your implementation. However, the loss went to Nan on the latest commit so, following #40, I used this tree: https://github.com/werner-duvaud/muzero-general/tree/f5dd3d2a3fd2e0c354731112b23c2d4c55811914. However, when I check nvidia-smi, it shows that only around 100mb of my GPU is used:+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:02:00.0 On | N/A |
| 25% 60C P2 94W / 250W | 1553MiB / 11177MiB | 30% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3234 G /usr/lib/xorg/Xorg 265MiB |
| 0 15010 G ...uest-channel-token=17125473231410870467 115MiB |
| 0 15854 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 404MiB |
| 0 18818 C ray::Trainer.continuous_update_weights() 741MiB |
| 0 27228 G ...charm-community-2018.1.4/jre64/bin/java 11MiB |
+-----------------------------------------------------------------------------+
and the program is heavily reliant on CPU. I noticed in the latest commit, this was fixed. Any ideas on how I can get the model to run on GPU?
Thanks,
Weichen
Training Connect4 with default settings results in many errors.
python muzero.py
Branch master, commit "c046c03 Fix backpropagate"
Python 3.7, Ubuntu 20.04 LTS, RTX6000 GPU (24GB), 8 CPU, 32GB RAM
https://asciinema.org/a/QR0bM7MH4TR2ulgGddZ4luJ1n
Loss: nan
Warning : Extreme values (nan) in game priorities. Could be underfitting or overfitting.
2020-07-14 20:24:39,425.ERROR worker.py:987 -- Possible unhandled error from worker: ray::SelfPlay.continuous_self_play() (pid=10473, ip=45.79.123.77)
File "python/ray/_raylet.pyx", line 446, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 400, in ray._raylet.execute_task.function_executor
File "/root/muzero-general/self_play.py", line 48, in continuous_self_play
0,
File "/root/muzero-general/self_play.py", line 142, in play_game
False if temperature == 0 else True,
File "/root/muzero-general/self_play.py", line 312, in run
action, node = self.select_child(node, min_max_stats)
File "/root/muzero-general/self_play.py", line 359, in select_child
for action, child in node.children.items()
File "mtrand.pyx", line 907, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.