Code Monkey home page Code Monkey logo

Comments (19)

benediamond avatar benediamond commented on September 7, 2024 1

Hi @yhyu13, glad to have you on board.

I have managed to train 1 generation so far on a separate fork I have created, in which I have tried certain modifications:

  • I represent the board by a 12x8x8 array of bits (one-hot), instead of a 2x8x8 array of characters (i.e. integers, e.g. 'Q' --> 81). This appears closer to general machine learning principles, though it's good that the original method is working.
  • I eliminate material counts in both triggering resignations and scoring endgames (to avoid using human heuristics, as in AlphaGo Zero).
  • Meanwhile, to improve the quality of self-play, and to eliminate draws when possible, I use the syzygy endgame tablebases to guide self-play when there are 5 or fewer pieces on the board (tablebases are not "heuristics" but rather mathematically proven optimal play).

@Zeta36 and @yhyu13, It appears I have veered off course a bit in these experiments, unfortunately, and you are already trying different things. I am very excited about joining the distributed effort, but first I would like to learn a bit more whether the above methods can succeed. Once I have satisfied my curiosity, I will contribute my machine (NVIDIA GeForce GTX 1080) to the cause. Please let me know if you'd like to learn more about what I've tried.

from chess-alpha-zero.

musketeerchess avatar musketeerchess commented on September 7, 2024 1

hi friends, i'm not a good programmer. I invented chess variants and i'm looking for someone to help me create a program based on This project to play my chess variant named Musketeer Chess.
The main objective is finally to see the progress of this self learning program and the second objective is if it's good enough in playing to try to have an evaluation of the fairy chess pieces compared to the classic chess pieces. This is a job offer.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Hi, @brianprichardson.

  1. I'm really glad to hear that!! I know you did some modifications to the code in your fork. Please can you redo those same changes into this repo so I can merge these changes? (Included the model parameters you are using: simulations per movement, etc.)

  2. I'd love too if you could please share with me the weights of your best model until now!!

  3. Can you tell me also how many generations (changes in the best model) did you got already?

Thank you for your help and collaboration with this project!!

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Now, how can we distribute it (at least self play and evaluation) like LeelaZero and Fishtest?

This is going to be really easy because of the way the project works. The only thing we have to do is to use some external server (in internet) from where we all to read/write the best model.

So the training pipeline will be exactly the same except the best model is not going to be loaded/saved from/to our local machines but in some server. You'll be able to keep even the progress you already got until now.

This is what I'm going to do. Let me copy your current fork into a new branch in here, and I'll develop the changes needed. Then I will tell you so we can try to train in a distribute way. I'm sure @yhyu13 will help us also with this. He's got also a good GPU.

from chess-alpha-zero.

brianprichardson avatar brianprichardson commented on September 7, 2024

I might have created a fork unintentionally as I am still quite new to GitHub. I am running the 11/26 99f... commit. The only changes I made were to the gpu mems listed above.

I did experiment a bit with various gpu mem limits as I could not get all three workers running at the same time. Even now with 24GB there is some swapping. So, I just created a bunch of self play games, then ran opt for a while and then eval. Once there was some data, I ran all three workers for about a day. There are 65 models in next_generation, but I think only 2 new best models.

Attaching log file which might help you.
main.log

PS: I have not done my homework and studied all of the code enough yet and most of my hardware is currently running LeelaZero.

from chess-alpha-zero.

yhyu13 avatar yhyu13 commented on September 7, 2024

@Zeta36

It seems like you have partnered with @benediamond, that's great. I have trained 4 generations of models off and on last week. You are also doing distributed learning? It would like to know how and your plan. XD

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Hi again, @yhyu13 :).

Yes, we are working together in the project. I recently added a new step in the training pipeline with a supervised learning process. This SL is based in PGN files (human chess games you can download from internet) and it'd be something like a pre-train previous to the self-play training step. DeepMind did something similar in its first version of AlphaGo in order to help the policy to not start totally random in the self-play process.

So the idea would be to train first using the "sl" worker instead of the "self" one. I mean: you run as before three workers at the same time, but instead of using:

python src/chess_zero/run.py self --type distributed
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed

We will use until convergence:

python src/chess_zero/run.py sl --type distributed
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed

After convergence you would stop the "sl" worker and start the "self" one (we'd need to use various PGN files with millions of movements and wait for convergence of the model).

The '--type distributed' parameter is the only thing you need to add to work in a distributed way ;).

Our hope is that pre-training the policy with SL then we will be able to generated instantaneously good data play in a self-play manner. In the game of Go this step is not necessary because of the rules of the game, but in chess the rules are more complex and it seems a totally random starting policy is unlikely to be able to generate enough quality data or that it would requiere a huge amount of time (and maybe thousands of GPUs).

Regards.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

@benediamond did you get convergence in the optimization worker? I tried with your one-hot version but the model seemed not to converge (at least in your version I tried some days ago).

from chess-alpha-zero.

benediamond avatar benediamond commented on September 7, 2024

@Zeta36 reaching a model checkpoint appears to take about 20 minutes. from total scratch I have losses
loss: 7.8623 - policy_out_loss: 7.1453 - value_out_loss: 0.3209.
by the time of the checkpoint, i have
loss: 0.4104 - policy_out_loss: 0.0029 - value_out_loss: 0.0029.
how does this compare to your experience?

it's strange that this doesn't work, but I will be happy to switch back if using ord() is better.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

@benediamond did you reach a loss so near 0?? but, are you generating constantly new play data with the self-play worker or do you stop the self-play and then you start the optimization? It sounds a lot as a over-fitting issue.

Moreover, are you constantly evaluating the best model using the eval worker? How many times was best model changed by this evaluator worker?

from chess-alpha-zero.

benediamond avatar benediamond commented on September 7, 2024

I am generating new self-play data, but my optimizer works much faster. So I generate many models on the same batch of self-play data.

Yes, it sounds like overfitting. Do you think I should reduce the learning rate? How did your optimizer perform?

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

@benediamond, my optimizer performs much more slowly but as you know I have no GPU only CPU.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

@benediamond, have you try to play against your best model? I mean you playing with the play_gui option. If your model is not overfitting then the model should work more or less fine.

from chess-alpha-zero.

benediamond avatar benediamond commented on September 7, 2024

@Zeta36 you mean, I myself play against it? I haven't tried, but I can. But I don't expect it to be good. My evaluator has replaced the best model only once so far, but I only began training last night (I have been making various changes...)

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

It's really very strange that the evaluator only changed the best model once, @benediamond. It is not possible you to reach a loss near 0 and that your next generation model cannot win the random one (first best model is a random one). You've got to have some kind of bug over there.

from chess-alpha-zero.

benediamond avatar benediamond commented on September 7, 2024

@Zeta36 Yes. I think the more parameters, the more chance of overfitting, and I didn't realize this when I implemented the one-hot feature. I have now reduced the learning rate tenfold.

On the other hand, I only began training this present version last night. And it looks like I will soon get a 2nd model... I will keep testing.

I am also experimenting with another feature:

  • Instead of using whole games for training, we can randomly generate positions with only 5 (or 6, 7, 8) pieces on the board. Because the tablebase takes over at <= 5 pieces, the overall quality of play will be very high. Thus we very quickly generate large amounts of high-quality training data.

I've also done something rather pointless but perhaps useful:

  • I have made white the "primary" side, as is typical in chess. This is merely a matter of renaming variables, and makes no difference to the machine model.

In any case, @Zeta36, I am interested in exploring these features. I will soon implement / copy in your supervised learning feature. If the results look very strong, then perhaps with your permission I may push my fork into a sub-branch of your repository. Please let me know.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Of course you can :).

Let's keep working.

from chess-alpha-zero.

ilpadrinohack avatar ilpadrinohack commented on September 7, 2024

What do you mean with "convergence of the model"?....just to know when I have to switch from "sl" to "self"

from chess-alpha-zero.

coughlnj avatar coughlnj commented on September 7, 2024

@Zeta36
I'm trying to understand this project as AlphaZero is very interesting to me.
Is the following true as a high level overview of the workers?

self_play.py is playing matches between the same model and saving them to a file location. Then optimize.py is loading the matches, converting the moves to a suitable format, and training the next_gen model. Next evaluate.py is playing matches between the best_model and the next_gen model. If the next_gen wins at X rate it replaces best_model.

Thanks,
coughlnj

from chess-alpha-zero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.