<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Hi, Thank you for reporting. It should be good now. <p dir="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

i try to train gomoku. but i got an error about muzero-general HOT 4 CLOSED

coder-free commented on May 19, 2024

i try to train gomoku. but i got an error

from muzero-general.

Comments (4)

coder-free commented on May 19, 2024

I found my nvidia driver is loss. maybe ubuntu auto update kernel lead to nvidia driver is gone. I am not sure if this is related to this error.
I will install the driver and retry.

from muzero-general.

werner-duvaud commented on May 19, 2024

Hi,

Thank you for reporting.
It should be good now.

By the way, know that we have not really tested Gomoku so the hyperparameters are probably not well adjusted.

from muzero-general.

coder-free commented on May 19, 2024

@werner-duvaud Hi, I train the gomoku. I changed some params. I don't quite understand the relationship between training step and played games. On finish I got this result:

Last test reward: 1.00. Training step: 50000/50000. Played games: 82. Loss: 3.93

Training params I set is:

### Training
self.results_path = os.path.join(os.path.dirname(__file__), "../results", os.path.basename(__file__)[:-3], datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%S"))  # Path to store the model weights and TensorBoard logs
self.training_steps = 50000  # Total number of training steps (ie weights update according to a batch)
self.batch_size = 512  # Number of parts of games to train on at each training step
self.checkpoint_interval = 10  # Number of training steps before using the model for sef-playing
self.value_loss_weight = 0.25  # Scale the value loss to avoid overfitting of the value function, paper recommends 0.25 (See paper appendix Reanalyze)
self.training_device = "cuda" if torch.cuda.is_available() else "cpu"  # Train on GPU if available

self.optimizer = "SGD"  # "Adam" or "SGD". Paper uses SGD
self.weight_decay = 1e-4  # L2 weights regularization
self.momentum = 0.9  # Used only if optimizer is SGD

I think it is impossible for 82 games to produce 50000 * 512 parts of games.

The self-play params I set is:

### Self-Play
self.num_actors = 4  # Number of simultaneous threads self-playing to feed the replay buffer
self.max_moves = 121  # Maximum number of moves if game is not finished before
self.num_simulations = 121  # Number of future moves self-simulated
self.discount = 1  # Chronological discount of the reward
self.temperature_threshold = 80  # Number of moves before dropping temperature to 0 (ie playing according to the max)

Board size is 11 x 11.

from muzero-general.

werner-duvaud commented on May 19, 2024

We use a replay buffer. The game moves are stored inside and we draw in to generate batches. One batch for one training step.

As you mentioned, the network has trained many times on the same data. You can increase the number of actors or adjust the ratio to control this.

muzero-general/games/gomoku.py

Line 106 in 9b49e16

    
           self.ratio = None  # Desired self played games per training step ratio. Equivalent to a synchronous version, training can take much longer. Set it to None to disable it

from muzero-general.

i try to train gomoku. but i got an error about muzero-general HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent