Code Monkey home page Code Monkey logo

Comments (15)

yhyu13 avatar yhyu13 commented on September 7, 2024 3

@Zeta36

I've created about 3.5GB game play data this Thanksgiving break. But I don't know where to find evaluation log? I will upload the best model to google drive so you can give it a try. I hope we are able to get a decent chess algo.

https://drive.google.com/drive/folders/1KNTggmQhp4E4MqZiCPhFqvPfff6MrMYz?usp=sharing

The architecture is 256 filters 7 layers, others remain the same.

Let me know if you have any trouble.

Regards!

from chess-alpha-zero.

yhyu13 avatar yhyu13 commented on September 7, 2024 2

@Zeta36

I have to apologize that I didn't notice it earlier. Since my computer (gaming laptop) is not a dedicated server, I can't promise your that these programmes will run without throwing error (the most common one is CUDA core dump) and abort in the middle. According to my schedule, I can run them for this week. If I get any good result, I will open an new issue and let you know immediately.

Regards

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

If you go here: https://lichess.org/analysis/standard and paste the movement of the end game (for example: rn1qkbnr/4pp2/pp1p3p/3p2p1/3P1Pb1/8/PPPKPBPP/RN3B1R b kq - 1 11) in the input field FEN of the web you will see a good looking board with the state of that (self)game when it finished.

The game result says us who is the winner and if it was a normal end game (checkmate, stalemate, etc.) or if it was cut-off game due to resignation. Resignation occurs when a player has a big advance of pieces about the other (more than 13 points of difference in the standard pieces value: where queens is worth 10, root 5.5, etc.).

EDIT: I believe it's running the normal model even though I set it to be mini

If you want to be sure about this you can just do a debug and break in the self-play.py worker in line 32. You have there the config class used. Evaluate the value of the "simulation_num_per_move" propriety. If it says it's 10 is the mini toy model, if it says 100 you are in the normal one.

Anyway if you did "python run.py self --type mini" you should be in the mini version. You can anyway debug the manager.py file and see what's the config file it load.

I am looking forward to see a visualization of game result?

Thank you very much @yhyu13. I know I have to get a good GPU but I'm right now lacking money.

Regards!!

from chess-alpha-zero.

yhyu13 avatar yhyu13 commented on September 7, 2024

@Zeta36

Thanks for you detailed reply, the chess board visualization is beautiful. I ran a profile to your algorithm and I found it astonishing:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   17.461   17.461 player_chess.py:82(search_moves)
        1    0.001    0.001   17.460   17.460 {method 'run_until_complete' of 'uvloop.loop.Loop' objects}
      302    0.001    0.000   16.942    0.056 player_chess.py:98(start_search_my_move)
  690/218    0.007    0.000   16.938    0.078 player_chess.py:106(search_my_move)
      233    0.113    0.000   16.667    0.072 player_chess.py:228(select_action_q_and_u)
      233    1.550    0.007   16.387    0.070 player_chess.py:232(<listcomp>)
  1894057    2.973    0.000    8.742    0.000 __init__.py:490(from_uci)
  1893824    0.698    0.000    6.097    0.000 __init__.py:3273(__contains__)
  1894057    0.973    0.000    5.405    0.000 __init__.py:1551(is_legal)
  4742482    4.768    0.000    4.768    0.000 {method 'index' of 'list' objects}
  1894057    1.805    0.000    4.216    0.000 __init__.py:1503(is_pseudo_legal)
  1958549    0.753    0.000    0.753    0.000 __init__.py:615(piece_type_at)
   158450    0.567    0.000    0.751    0.000 __init__.py:1256(generate_pseudo_legal_moves)
  1902996    0.655    0.000    0.655    0.000 __init__.py:425(__init__)
  1899679    0.650    0.000    0.650    0.000 __init__.py:449(__bool__)
       11    0.000    0.000    0.517    0.047 player_chess.py:173(prediction_worker)
        8    0.000    0.000    0.516    0.065 api_chess.py:9(predict)
        8    0.000    0.000    0.516    0.065 training.py:1879(predict_on_batch)
        8    0.000    0.000    0.515    0.064 tensorflow_backend.py:2338(__call__)
        8    0.000    0.000    0.515    0.064 session.py:781(run)
        8    0.000    0.000    0.515    0.064 session.py:1036(_run)
        8    0.000    0.000    0.512    0.064 session.py:1258(_do_run)
        8    0.000    0.000    0.512    0.064 session.py:1321(_do_call)
        8    0.000    0.000    0.512    0.064 session.py:1290(_run_fn)
        8    0.505    0.063    0.505    0.063 {built-in method _pywrap_tensorflow_internal.TF_Run}
  4743887    0.349    0.000    0.349    0.000 {built-in method builtins.len}
    14912    0.084    0.000    0.205    0.000 __init__.py:3078(generate_castling_moves)
  1004964    0.168    0.000    0.190    0.000 __init__.py:214(scan_reversed)
  1911511    0.171    0.000    0.171    0.000 __init__.py:1554(is_variant_end)
      233    0.001    0.000    0.161    0.001 chess_env.py:37(step)
      233    0.013    0.000    0.140    0.001 __init__.py:1759(can_claim_threefold_repetition)
      766    0.002    0.000    0.129    0.000 chess_env.py:135(replace_tags)
      766    0.002    0.000    0.125    0.000 __init__.py:2008(fen)
      766    0.003    0.000    0.123    0.000 __init__.py:2252(epd)
       99    0.121    0.001    0.121    0.001 {method 'dirichlet' of 'mtrand.RandomState' objects}
      666    0.001    0.000    0.114    0.000 player_chess.py:224(counter_key)
      766    0.032    0.000    0.096    0.000 __init__.py:719(board_fen)
    91774    0.083    0.000    0.083    0.000 __init__.py:1385(attacks_mask)
    20282    0.010    0.000    0.078    0.000 {built-in method builtins.any}
    14912    0.009    0.000    0.069    0.000 __init__.py:3068(_attacked_for_king)

The bottleneck is that you are trying to check a legal move or not on the fly, it works for reversi. I tried it in Go, it's too expensive, not to mention over 8000 labels in Chess.

I've employed the strategy in Go where an illegal move results in leaf_v = -1. Does it make sense in your set up?

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

@yhyu13, I really appreciate your interest in the project. Please no hesitate in pull request any change you want to do in the project, or even if you want to be a collaborator just tell me :).

Regards!

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

By the way,@yhyu13 . All those huge number of calls , the bigger ones in your visualization are due to internal calls of the python-chess library. It seems it's very expensive to check the legal moves in a board state:

1894057 2.973 0.000 8.742 0.000 init.py:490(from_uci)
1893824 0.698 0.000 6.097 0.000 init.py:3273(contains)
1894057 0.973 0.000 5.405 0.000 init.py:1551(is_legal)
4742482 4.768 0.000 4.768 0.000 {method 'index' of 'list' objects}
1894057 1.805 0.000 4.216 0.000 init.py:1503(is_pseudo_legal)
1958549 0.753 0.000 0.753 0.000 init.py:615(piece_type_at)
158450 0.567 0.000 0.751 0.000 init.py:1256(generate_pseudo_legal_moves)
1902996 0.655 0.000 0.655 0.000 init.py:425(init)
1899679 0.650 0.000 0.650 0.000 init.py:449(bool)

and it seems it's also expensive to check game over:

14912    0.084    0.000    0.205    0.000 __init__.py:3078(generate_castling_moves)

1004964 0.168 0.000 0.190 0.000 init.py:214(scan_reversed)
1911511 0.171 0.000 0.171 0.000 init.py:1554(is_variant_end)

I think this issue is due to the rules of chess that are much more complex than those of Go or Reversi. In check you can end a game in a lot of game (checkmate, stalemate, etc.) and you have to check a lot of rules to detect legal moves in a board state.

I don't know if python-chess is a poorly developed framework or if this is a intrinsic computational cost of the chess rules.

Can't you try to train with those cost? I mean, it's impossible to train the model in this way?

from chess-alpha-zero.

yhyu13 avatar yhyu13 commented on September 7, 2024

@Zeta36

EDIT: Please review pull request

It seems like the bottleneck is the performance of a single cpu. With the fact that it takes about 15 times slower to generate data, I believe we will first lose patience before it actually learn something. I will see if it's possible to walk around the legal move. Done!

Here could be optimized as such:

def __init__(*args,**kwargs):
...
self.move_lookup = {k:v for k,v  in zip((chess.Move.from_uci(mov) for mov in self.config.labels),range(len(self.config.labels)))}
def select_action_q_and_u(*args,**kwargs):
...
legal_moves = [self.move_lookup[move] for move in env.board.legal_moves]
legal_labels = np.zeros(len(self.config.labels)
logger.debug(legal_moves)
legal_labels[legal_moves] = 1

Here is the brand new profiled result (10x faster,100simulation 1.357s):

*** PROFILER RESULTS ***
expand_and_evaluate (src/chess_zero/agent/player_chess.py:157)
function called 100 times

         0 function calls in 0.000 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)



*** PROFILER RESULTS ***
search_moves (src/chess_zero/agent/player_chess.py:84)
function called 1 times

         555528 function calls (554819 primitive calls) in 1.357 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 473 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.357    1.357 player_chess.py:84(search_moves)
        1    0.001    0.001    1.356    1.356 {method 'run_until_complete' of 'uvloop.loop.Loop' objects}
       11    0.000    0.000    0.856    0.078 player_chess.py:180(prediction_worker)
       10    0.000    0.000    0.855    0.086 api_chess.py:9(predict)
       10    0.000    0.000    0.855    0.086 training.py:1879(predict_on_batch)
       10    0.000    0.000    0.854    0.085 tensorflow_backend.py:2338(__call__)
       10    0.000    0.000    0.853    0.085 session.py:781(run)
       10    0.000    0.000    0.853    0.085 session.py:1036(_run)
       10    0.000    0.000    0.850    0.085 session.py:1258(_do_run)
       10    0.000    0.000    0.849    0.085 session.py:1321(_do_call)
       10    0.000    0.000    0.849    0.085 session.py:1290(_run_fn)
       10    0.815    0.082    0.815    0.082 {built-in method _pywrap_tensorflow_internal.TF_Run}
      333    0.001    0.000    0.498    0.001 player_chess.py:101(start_search_my_move)
  762/249    0.004    0.000    0.495    0.002 player_chess.py:109(search_my_move)
      224    0.019    0.000    0.229    0.001 player_chess.py:235(select_action_q_and_u)
      224    0.001    0.000    0.160    0.001 chess_env.py:37(step)
      224    0.012    0.000    0.139    0.001 __init__.py:1759(can_claim_threefold_repetition)
      748    0.002    0.000    0.129    0.000 chess_env.py:135(replace_tags)
      748    0.002    0.000    0.125    0.000 __init__.py:2008(fen)
      748    0.003    0.000    0.123    0.000 __init__.py:2252(epd)
       99    0.121    0.001    0.121    0.001 {method 'dirichlet' of 'mtrand.RandomState' objects}
      648    0.001    0.000    0.114    0.000 player_chess.py:231(counter_key)
      748    0.033    0.000    0.097    0.000 __init__.py:719(board_fen)
     5064    0.030    0.000    0.068    0.000 __init__.py:1802(push)
    10014    0.008    0.000    0.067    0.000 __init__.py:3034(generate_legal_moves)
    47872    0.029    0.000    0.051    0.000 __init__.py:607(piece_at)
    10344    0.020    0.000    0.048    0.000 __init__.py:1256(generate_pseudo_legal_moves)
      200    0.001    0.000    0.045    0.000 player_chess.py:157(expand_and_evaluate)
      224    0.005    0.000    0.043    0.000 player_chess.py:239(<listcomp>)
       10    0.000    0.000    0.034    0.003 session.py:1338(_extend_graph)
        1    0.033    0.033    0.033    0.033 {built-in method _pywrap_tensorflow_internal.TF_ExtendGraph}
      100    0.000    0.000    0.024    0.000 chess_env.py:125(black_and_white_plane)
      748    0.009    0.000    0.022    0.000 __init__.py:1971(castling_xfen)
    63288    0.022    0.000    0.022    0.000 __init__.py:615(piece_type_at)
     5631    0.003    0.000    0.020    0.000 {built-in method builtins.any}
    43166    0.013    0.000    0.015    0.000 __init__.py:214(scan_reversed)
     5101    0.006    0.000    0.015    0.000 __init__.py:3148(_transposition_key)
    10128    0.011    0.000    0.015    0.000 __init__.py:646(_remove_piece_at)
      224    0.000    0.000    0.013    0.000 __init__.py:2650(push_uci)
     4840    0.010    0.000    0.011    0.000 __init__.py:1918(pop)

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Thank you very much for your effort, @yhyu13. I really appreciate it.

I'm looking forward to see if our approach is able to perform a good chess player (maybe not a master but at least an amateur).

I'm going to add you as collaborator so you can push anything you want without asking me for a pull request.

Regards!!

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Hello, @yhyu13 .

I've done today a new version of the Reversi Zero project. This time I adapted it to the game Connect4: https://github.com/Zeta36/connect4-alpha-zero

I'm really in love with the implementation of @mokemokechicken. He did it (and DeepMind thought it) in a way that I can apply it easily to any new environment I imagine.

Moreover, Connect4 is a more easy game and I could train the model without GPU. Results are amazing. The model is able to learn to play well in only 3 generations in a couple of hours (just with a Intel i5 CPU).

It's a pitty I don't have enough power machine to check if the chess version is able to learn to play well.

from chess-alpha-zero.

yhyu13 avatar yhyu13 commented on September 7, 2024

@Zeta36

Good work! It reminds me another github project I came across called mini-alphaGo in playing the game of connect4. It looks way messier than your implementation. You gotta thank @mokemokechicken a lot. The software framework incorporated by him/her is neat, with the help of Keras as a model wrapper.

To my knowledge, backgammon, chess, go, and draughts are the biggest four abstract strategy board game. The first three all have been "solved" by having computer program plays better than any human players. I have not heard anything good from draughts, though. It would be great if you can move onto this game after connect4. A quick search leads me to: https://github.com/codeofcarson/Checkers. Take a look if you are interested in it.

I will be absent during thanksgiving break. But I set up a script that runs/restarts your chess zero automatically. I apologize for not setting up a server so that you can take a look at the result. It should be good after I go back home.

Regards.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

I played against your best model but the results were not good :(.

C:\Users\Samu\Anaconda3\python.exe "ML 2017/chess-alpha-zero-git/src/chess_zero/run.py" play_gui
2017-11-26 23:08:15,087@chess_zero.manager INFO # config type: normal
Using TensorFlow backend.
2017-11-26 23:08:46,180@chess_zero.agent.model_chess DEBUG # loading model from ML 2017\chess-alpha-zero-git\data\model\model_best_config.json
2017-11-26 23:08:52,563@chess_zero.agent.model_chess DEBUG # loaded model digest = 4aa6e5358d339f13f388d5e6eb00827bafd43d3073f248b634b041f6f8cc9513
2017-11-26 23:08:52,620@asyncio DEBUG # Using selector: SelectSelector
2017-11-26 23:09:04,007@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(2, 63), value move=(1, 72)
IA moves to: g2g3

r n b q k b n r
p p p p p p p p
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . P .
P P P P P P . P
R N B Q K B N R

Board fen = rnbqkbnr/pppppppp/8/8/8/6P1/PPPPPP1P/RNBQKBNR b KQkq - 0 1

Enter your movement in UCI format(a1a2, b2b6,...): e7e5
You move to: e7e5

r n b q k b n r
p p p p . p p p
. . . . . . . .
. . . . p . . .
. . . . . . . .
. . . . . . P .
P P P P P P . P
R N B Q K B N R

Board fen = rnbqkbnr/pppp1ppp/8/4p3/8/6P1/PPPPPP1P/RNBQKBNR w KQkq - 0 2
IA moves to: g1h3

r n b q k b n r
p p p p . p p p
. . . . . . . .
. . . . p . . .
. . . . . . . .
. . . . . . P N
P P P P P P . P
R N B Q K B . R

Board fen = rnbqkbnr/pppp1ppp/8/4p3/8/6PN/PPPPPP1P/RNBQKB1R b KQkq - 1 2

Enter your movement in UCI format(a1a2, b2b6,...): g8f6
You move to: g8f6

r n b q k b . r
p p p p . p p p
. . . . . n . .
. . . . p . . .
. . . . . . . .
. . . . . . P N
P P P P P P . P
R N B Q K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/4p3/8/6PN/PPPPPP1P/RNBQKB1R w KQkq - 2 3
2017-11-26 23:10:14,659@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 200), value move=(0, 400)
2017-11-26 23:10:27,093@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 200), value move=(0, 8)
IA moves to: d2d4

r n b q k b . r
p p p p . p p p
. . . . . n . .
. . . . p . . .
. . . P . . . .
. . . . . . P N
P P P . P P . P
R N B Q K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/4p3/3P4/6PN/PPP1PP1P/RNBQKB1R b KQkq - 0 3

Enter your movement in UCI format(a1a2, b2b6,...): e5d4
You move to: e5d4

r n b q k b . r
p p p p . p p p
. . . . . n . .
. . . . . . . .
. . . p . . . .
. . . . . . P N
P P P . P P . P
R N B Q K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/8/3p4/6PN/PPP1PP1P/RNBQKB1R w KQkq - 0 4
2017-11-26 23:11:09,755@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(2, 192), value move=(0, 192)
2017-11-26 23:11:21,378@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(2, 192), value move=(1, 65)
IA moves to: d1d4

r n b q k b . r
p p p p . p p p
. . . . . n . .
. . . . . . . .
. . . Q . . . .
. . . . . . P N
P P P . P P . P
R N B . K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/8/3Q4/6PN/PPP1PP1P/RNB1KB1R b KQkq - 0 4

Enter your movement in UCI format(a1a2, b2b6,...): b8c6
You move to: b8c6

r . b q k b . r
p p p p . p p p
. . n . . n . .
. . . . . . . .
. . . Q . . . .
. . . . . . P N
P P P . P P . P
R N B . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/3Q4/6PN/PPP1PP1P/RNB1KB1R w KQkq - 1 5
2017-11-26 23:12:14,997@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(6, 215), value move=(7, 216)
IA moves to: d4e4

r . b q k b . r
p p p p . p p p
. . n . . n . .
. . . . . . . .
. . . . Q . . .
. . . . . . P N
P P P . P P . P
R N B . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/4Q3/6PN/PPP1PP1P/RNB1KB1R b KQkq - 2 5

Enter your movement in UCI format(a1a2, b2b6,...): f6e4
You move to: f6e4

r . b q k b . r
p p p p . p p p
. . n . . . . .
. . . . . . . .
. . . . n . . .
. . . . . . P N
P P P . P P . P
R N B . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n5/8/4n3/6PN/PPP1PP1P/RNB1KB1R w KQkq - 0 6
IA moves to: c1e3

r . b q k b . r
p p p p . p p p
. . n . . . . .
. . . . . . . .
. . . . n . . .
. . . . B . P N
P P P . P P . P
R N . . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n5/8/4n3/4B1PN/PPP1PP1P/RN2KB1R b KQkq - 1 6

Enter your movement in UCI format(a1a2, b2b6,...): e4f6
You move to: e4f6

r . b q k b . r
p p p p . p p p
. . n . . n . .
. . . . . . . .
. . . . . . . .
. . . . B . P N
P P P . P P . P
R N . . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/8/4B1PN/PPP1PP1P/RN2KB1R w KQkq - 2 7
2017-11-26 23:14:12,397@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(6, 269), value move=(1, 65)
2017-11-26 23:14:24,565@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 65), value move=(0, 72)
2017-11-26 23:14:35,581@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 65), value move=(2, 63)
2017-11-26 23:14:46,832@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 65), value move=(0, 255)
IA moves to: e1d1

r . b q k b . r
p p p p . p p p
. . n . . n . .
. . . . . . . .
. . . . . . . .
. . . . B . P N
P P P . P P . P
R N . K . B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/8/4B1PN/PPP1PP1P/RN1K1B1R b kq - 3 7

Enter your movement in UCI format(a1a2, b2b6,...): d7d6
You move to: d7d6

r . b q k b . r
p p p . . p p p
. . n p . n . .
. . . . . . . .
. . . . . . . .
. . . . B . P N
P P P . P P . P
R N . K . B . R

Board fen = r1bqkb1r/ppp2ppp/2np1n2/8/8/4B1PN/PPP1PP1P/RN1K1B1R w kq - 0 8
IA moves to: b1c3

r . b q k b . r
p p p . . p p p
. . n p . n . .
. . . . . . . .
. . . . . . . .
. . N . B . P N
P P P . P P . P
R . . K . B . R

Board fen = r1bqkb1r/ppp2ppp/2np1n2/8/8/2N1B1PN/PPP1PP1P/R2K1B1R b kq - 1 8

Enter your movement in UCI format(a1a2, b2b6,...): c8h3
You move to: c8h3

r . . q k b . r
p p p . . p p p
. . n p . n . .
. . . . . . . .
. . . . . . . .
. . N . B . P b
P P P . P P . P
R . . K . B . R

Board fen = r2qkb1r/ppp2ppp/2np1n2/8/8/2N1B1Pb/PPP1PP1P/R2K1B1R w kq - 0 9
2017-11-26 23:16:58,419@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 145), value move=(7, 270)
2017-11-26 23:17:10,792@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 145), value move=(7, 0)
2017-11-26 23:17:21,747@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(7, 0), value move=(0, 191)
2017-11-26 23:17:32,833@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(0, 146), value move=(1, 72)
IA moves to: b2b4

r . . q k b . r
p p p . . p p p
. . n p . n . .
. . . . . . . .
. P . . . . . .
. . N . B . P b
P . P . P P . P
R . . K . B . R

Board fen = r2qkb1r/ppp2ppp/2np1n2/8/1P6/2N1B1Pb/P1P1PP1P/R2K1B1R b kq - 0 9

Enter your movement in UCI format(a1a2, b2b6,...):

You can check by yourself the result by playing with: "play_gui" option.

Here you can see the last state of the board: https://www.chess.com/dynboard?fen=r2qkb1r/ppp2ppp/2np1n2/8/1P6/2N1B1Pb/P1P1PP1P/R2K1B1R%20b%20kq%20b3%200%209&board=green&piece=neo&size=3

NN plays white. At first it looked more or less fine when model gets the pawn in 3nth move, but then NN loses its queen :( (although it seemed it wanted to escape).

I don't know if this is a good enough result for the time your spent in the training until now. What do you think?

@yhyu13 what was the loss the optimization worker showed to you? and how many times the evaluator worker changed its best model after the tournament? I mean, you said you executed the self-play worker a lot of time but, can you tell me what was what the other two simultaneous workers showed you in the console in all this time?

from chess-alpha-zero.

yhyu13 avatar yhyu13 commented on September 7, 2024

@Zeta36

I just noticed that the self-play pipeline is manual. 😂 I am trying to train it with opt mode but 1) the total loss blows up to NaN but neither policy loss or value loss does. I assume that the total loss has include weight decay though I haven't where it explicitly declare here 2) there is no stopping criteria for training? here

I am not sure I understand how does the self-play pipeline go at this moment.

Regards

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

@yhyu13, I'm afraid you did not follow the correct way to train the model :P.

You have to run "at the same time" the three workers: self-play, opt, and eval. Indeed It's easy. You just run the run.py script three times in three different consoles (terminals) for example. You just do something like:

python run.py self
python run.py opt
python run.py eval

You will have to delete all the self-play done until now :( and start from scratch.

Self-play will start with a random best model and will generate games. After a wile (a fixed number of games), the optimization process will start (opt) and eventually it will create a next_generation (ng) model that the worker evaluator (eval) will detect. The evaluator will then make best model and ng model to play one against other. If ng wins more than 55% it will become the best model and so on.

Indeed, AlphaGo Zero idea it's pretty similar to an evolutionary algorithm with selection by tournament.

Regards, friend.

from chess-alpha-zero.

Zeta36 avatar Zeta36 commented on September 7, 2024

Perfect!! I really thank you for your help :).

from chess-alpha-zero.

dklausa avatar dklausa commented on September 7, 2024

@yhyu13

Just some input as an avid game player. Funny you should mention draughts as the one game among the big 4 not yet "solved". Actually, to the best of my knowledge it was the first of those in which humans became thoroughly outclassed by a program, namely Chinook. At least with Kasparov and Deep Blue it was close. Good backgammon players can still win often against eXtreme Gammon, due to the luck element. Perhaps the greatest distance between the best human and the best program in a prominent board game is in Othello, with the program WZebra. Also, none of them are technically solved, which implies that a complete game tree is documented, with proof of best move in every possible situation.

from chess-alpha-zero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.