Code Monkey home page Code Monkey logo

Comments (4)

coder-free avatar coder-free commented on May 19, 2024

I found my nvidia driver is loss. maybe ubuntu auto update kernel lead to nvidia driver is gone. I am not sure if this is related to this error.
I will install the driver and retry.

from muzero-general.

werner-duvaud avatar werner-duvaud commented on May 19, 2024

Hi,

Thank you for reporting.
It should be good now.

By the way, know that we have not really tested Gomoku so the hyperparameters are probably not well adjusted.

from muzero-general.

coder-free avatar coder-free commented on May 19, 2024

@werner-duvaud Hi, I train the gomoku. I changed some params. I don't quite understand the relationship between training step and played games. On finish I got this result:

Last test reward: 1.00. Training step: 50000/50000. Played games: 82. Loss: 3.93

Training params I set is:

### Training
self.results_path = os.path.join(os.path.dirname(__file__), "../results", os.path.basename(__file__)[:-3], datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%S"))  # Path to store the model weights and TensorBoard logs
self.training_steps = 50000  # Total number of training steps (ie weights update according to a batch)
self.batch_size = 512  # Number of parts of games to train on at each training step
self.checkpoint_interval = 10  # Number of training steps before using the model for sef-playing
self.value_loss_weight = 0.25  # Scale the value loss to avoid overfitting of the value function, paper recommends 0.25 (See paper appendix Reanalyze)
self.training_device = "cuda" if torch.cuda.is_available() else "cpu"  # Train on GPU if available

self.optimizer = "SGD"  # "Adam" or "SGD". Paper uses SGD
self.weight_decay = 1e-4  # L2 weights regularization
self.momentum = 0.9  # Used only if optimizer is SGD

I think it is impossible for 82 games to produce 50000 * 512 parts of games.

The self-play params I set is:

### Self-Play
self.num_actors = 4  # Number of simultaneous threads self-playing to feed the replay buffer
self.max_moves = 121  # Maximum number of moves if game is not finished before
self.num_simulations = 121  # Number of future moves self-simulated
self.discount = 1  # Chronological discount of the reward
self.temperature_threshold = 80  # Number of moves before dropping temperature to 0 (ie playing according to the max)

Board size is 11 x 11.

from muzero-general.

werner-duvaud avatar werner-duvaud commented on May 19, 2024

We use a replay buffer. The game moves are stored inside and we draw in to generate batches. One batch for one training step.

As you mentioned, the network has trained many times on the same data. You can increase the number of actors or adjust the ratio to control this.

self.ratio = None # Desired self played games per training step ratio. Equivalent to a synchronous version, training can take much longer. Set it to None to disable it

from muzero-general.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.