Comments (19)
Hi @yhyu13, glad to have you on board.
I have managed to train 1 generation so far on a separate fork I have created, in which I have tried certain modifications:
- I represent the board by a 12x8x8 array of bits (one-hot), instead of a 2x8x8 array of characters (i.e. integers, e.g.
'Q' --> 81
). This appears closer to general machine learning principles, though it's good that the original method is working. - I eliminate material counts in both triggering resignations and scoring endgames (to avoid using human heuristics, as in AlphaGo Zero).
- Meanwhile, to improve the quality of self-play, and to eliminate draws when possible, I use the syzygy endgame tablebases to guide self-play when there are 5 or fewer pieces on the board (tablebases are not "heuristics" but rather mathematically proven optimal play).
@Zeta36 and @yhyu13, It appears I have veered off course a bit in these experiments, unfortunately, and you are already trying different things. I am very excited about joining the distributed effort, but first I would like to learn a bit more whether the above methods can succeed. Once I have satisfied my curiosity, I will contribute my machine (NVIDIA GeForce GTX 1080) to the cause. Please let me know if you'd like to learn more about what I've tried.
from chess-alpha-zero.
hi friends, i'm not a good programmer. I invented chess variants and i'm looking for someone to help me create a program based on This project to play my chess variant named Musketeer Chess.
The main objective is finally to see the progress of this self learning program and the second objective is if it's good enough in playing to try to have an evaluation of the fairy chess pieces compared to the classic chess pieces. This is a job offer.
from chess-alpha-zero.
Hi, @brianprichardson.
-
I'm really glad to hear that!! I know you did some modifications to the code in your fork. Please can you redo those same changes into this repo so I can merge these changes? (Included the model parameters you are using: simulations per movement, etc.)
-
I'd love too if you could please share with me the weights of your best model until now!!
-
Can you tell me also how many generations (changes in the best model) did you got already?
Thank you for your help and collaboration with this project!!
from chess-alpha-zero.
Now, how can we distribute it (at least self play and evaluation) like LeelaZero and Fishtest?
This is going to be really easy because of the way the project works. The only thing we have to do is to use some external server (in internet) from where we all to read/write the best model.
So the training pipeline will be exactly the same except the best model is not going to be loaded/saved from/to our local machines but in some server. You'll be able to keep even the progress you already got until now.
This is what I'm going to do. Let me copy your current fork into a new branch in here, and I'll develop the changes needed. Then I will tell you so we can try to train in a distribute way. I'm sure @yhyu13 will help us also with this. He's got also a good GPU.
from chess-alpha-zero.
I might have created a fork unintentionally as I am still quite new to GitHub. I am running the 11/26 99f... commit. The only changes I made were to the gpu mems listed above.
I did experiment a bit with various gpu mem limits as I could not get all three workers running at the same time. Even now with 24GB there is some swapping. So, I just created a bunch of self play games, then ran opt for a while and then eval. Once there was some data, I ran all three workers for about a day. There are 65 models in next_generation, but I think only 2 new best models.
Attaching log file which might help you.
main.log
PS: I have not done my homework and studied all of the code enough yet and most of my hardware is currently running LeelaZero.
from chess-alpha-zero.
It seems like you have partnered with @benediamond, that's great. I have trained 4 generations of models off and on last week. You are also doing distributed learning? It would like to know how and your plan. XD
from chess-alpha-zero.
Hi again, @yhyu13 :).
Yes, we are working together in the project. I recently added a new step in the training pipeline with a supervised learning process. This SL is based in PGN files (human chess games you can download from internet) and it'd be something like a pre-train previous to the self-play training step. DeepMind did something similar in its first version of AlphaGo in order to help the policy to not start totally random in the self-play process.
So the idea would be to train first using the "sl" worker instead of the "self" one. I mean: you run as before three workers at the same time, but instead of using:
python src/chess_zero/run.py self --type distributed
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed
We will use until convergence:
python src/chess_zero/run.py sl --type distributed
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed
After convergence you would stop the "sl" worker and start the "self" one (we'd need to use various PGN files with millions of movements and wait for convergence of the model).
The '--type distributed' parameter is the only thing you need to add to work in a distributed way ;).
Our hope is that pre-training the policy with SL then we will be able to generated instantaneously good data play in a self-play manner. In the game of Go this step is not necessary because of the rules of the game, but in chess the rules are more complex and it seems a totally random starting policy is unlikely to be able to generate enough quality data or that it would requiere a huge amount of time (and maybe thousands of GPUs).
Regards.
from chess-alpha-zero.
@benediamond did you get convergence in the optimization worker? I tried with your one-hot version but the model seemed not to converge (at least in your version I tried some days ago).
from chess-alpha-zero.
@Zeta36 reaching a model checkpoint appears to take about 20 minutes. from total scratch I have losses
loss: 7.8623 - policy_out_loss: 7.1453 - value_out_loss: 0.3209
.
by the time of the checkpoint, i have
loss: 0.4104 - policy_out_loss: 0.0029 - value_out_loss: 0.0029
.
how does this compare to your experience?
it's strange that this doesn't work, but I will be happy to switch back if using ord()
is better.
from chess-alpha-zero.
@benediamond did you reach a loss so near 0?? but, are you generating constantly new play data with the self-play worker or do you stop the self-play and then you start the optimization? It sounds a lot as a over-fitting issue.
Moreover, are you constantly evaluating the best model using the eval worker? How many times was best model changed by this evaluator worker?
from chess-alpha-zero.
I am generating new self-play data, but my optimizer works much faster. So I generate many models on the same batch of self-play data.
Yes, it sounds like overfitting. Do you think I should reduce the learning rate? How did your optimizer perform?
from chess-alpha-zero.
@benediamond, my optimizer performs much more slowly but as you know I have no GPU only CPU.
from chess-alpha-zero.
@benediamond, have you try to play against your best model? I mean you playing with the play_gui option. If your model is not overfitting then the model should work more or less fine.
from chess-alpha-zero.
@Zeta36 you mean, I myself play against it? I haven't tried, but I can. But I don't expect it to be good. My evaluator has replaced the best model only once so far, but I only began training last night (I have been making various changes...)
from chess-alpha-zero.
It's really very strange that the evaluator only changed the best model once, @benediamond. It is not possible you to reach a loss near 0 and that your next generation model cannot win the random one (first best model is a random one). You've got to have some kind of bug over there.
from chess-alpha-zero.
@Zeta36 Yes. I think the more parameters, the more chance of overfitting, and I didn't realize this when I implemented the one-hot feature. I have now reduced the learning rate tenfold.
On the other hand, I only began training this present version last night. And it looks like I will soon get a 2nd model... I will keep testing.
I am also experimenting with another feature:
- Instead of using whole games for training, we can randomly generate positions with only 5 (or 6, 7, 8) pieces on the board. Because the tablebase takes over at <= 5 pieces, the overall quality of play will be very high. Thus we very quickly generate large amounts of high-quality training data.
I've also done something rather pointless but perhaps useful:
- I have made white the "primary" side, as is typical in chess. This is merely a matter of renaming variables, and makes no difference to the machine model.
In any case, @Zeta36, I am interested in exploring these features. I will soon implement / copy in your supervised learning feature. If the results look very strong, then perhaps with your permission I may push my fork into a sub-branch of your repository. Please let me know.
from chess-alpha-zero.
Of course you can :).
Let's keep working.
from chess-alpha-zero.
What do you mean with "convergence of the model"?....just to know when I have to switch from "sl" to "self"
from chess-alpha-zero.
@Zeta36
I'm trying to understand this project as AlphaZero is very interesting to me.
Is the following true as a high level overview of the workers?
self_play.py is playing matches between the same model and saving them to a file location. Then optimize.py is loading the matches, converting the moves to a suitable format, and training the next_gen model. Next evaluate.py is playing matches between the best_model and the next_gen model. If the next_gen wins at X rate it replaces best_model.
Thanks,
coughlnj
from chess-alpha-zero.
Related Issues (20)
- Compatible with Ampere GPUs?
- Requirements for Newer PCs - Changing tensorflow-gpu==1.15 to tensorflow HOT 1
- Module 'keras.backend' has no attribute 'observe_object_name' HOT 2
- need help understanding the network
- Gobang version
- Why is the engine so weak? HOT 1
- I just can't install the project.
- Evaluate move_model doesn't move model_dir (Windows)
- AttributeError: module 'chess.pgn' has no attribute 'scan_offsets' HOT 1
- Lot of errors in opt HOT 1
- AttributeError: '_thread._local' object has no attribute 'value'
- Addition for 50-moves rule and en passant
- Error with recent merge regarding sl proces
- After fresh SL training data not able to start training.
- Please, help me understand this. HOT 1
- policy_out dimension HOT 1
- Issue with maybe_flip_fen method HOT 5
- Change the command for Supervised Learning in README.md
- AttributeError: 'Functional' object has no attribute '_make_predict_function' HOT 1
- Takes 30s to 40s per move!! and Why does it play only one opening as white? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chess-alpha-zero.