it seems you train NN files almost every day, i regularly receive an email message abo

many trained NNs ? about winter HOT 6 OPEN

rosenthj commented on August 17, 2024

many trained NNs ?

from winter.

Comments (6)

rosenthj commented on August 17, 2024

The nets should be getting better, but it is non-trivial. Every net that makes it to master passed statistical tests on OpenBench ( http://chess.grantnet.us/index/ ) meaning it wins in a head to head against the prior master net. Unfortunately, this doesn't mean it is necessarily stronger, but based on my tests, this is mostly true. Note that when I did the switch, I think Winter got significantly worse in regular (not Chess960) play. So version v0.9.9 was something like 60 Elo stronger than v0.9.10 in regular chess, in my rough tests. I have started an OpenBench regression test to compare v1.09 with v1.0 which can be found here: http://chess.grantnet.us/test/29557/

There are so many newer nets as I essentially reworked my entire evaluation approach. The nets are no longer based on the same features or data as in Winter versions up to v0.9.9.

In this new approach, initially Winter relied on some 350k CCRL Fischer random games to train a network. This is an extremely small dataset relative to what top engines like Stockfish use (SF has in the billions of positions in its training dataset). Therefore, I cannot train networks to the size of those Stockfish uses, without massive overfitting issues. I am generating new double-Fischerrandom games based on Winter self play to train stronger networks. In the long-term I would like to drop the CCRL games and train nets exclusively on the self play games.

Hopefully double Fischer random games will result in more diverse middle and endgame positions, requiring less data to generalize. Furthermore, I like having the guarantee that Winter cannot memorize openings, so I intend to remove games from the standard starting position from the training set.

Winter's net architecture is changing from version to version, but at the moment the architecture is 772x224x3. The 772 inputs consist of the bitboards (64 binary values) for the 6 piecetypes for each respective side with 4 additional inputs encoding the castling rights for each side. The 3 outputs correspond to the probabilities of the side to move winning, drawing, and losing respectively. If I recall correctly and In contrast, the Stockfish architecture has an input dimension which is more than 100 times larger, multiple intermediate layers, but only a single output value.

from winter.

tissatussa commented on August 17, 2024

Thanks for this explanation.

Once i read that self-play games are not optimal to train an NN .. why not play against a set of other engines ? Their style may differ, that could be an advantage !?

And what about the classic eval : does Winter still use this and consider both evals to decide a best move ? How ?

I like having the guarantee that Winter cannot memorize openings,..

But when an engine prefers to answer 1.e4 with 1...e6 (the French Defence), you could train an NN with French games of the known best replies like Exchange variation, Advance variation, Tarrasch variation, Winawer etc. , all having their structures and ideas .. what's your opinion on this reasoning ?
Btw. I recall engines are not allowed to use an opening book at tournaments. Indeed, when letting engines play in CuteChess i always disable the books, because i think it's not fair.

from winter.

tissatussa commented on August 17, 2024

Another question which comes to my mind, being an outsider (i'm not programming any engine myself yet) : what about consulting several (small) NNs and decide a best move by combining / comparing the outcomes, and then maybe also judging the classic eval .. did you ever experiment with that ? I always wonder : how can we distinguish their strength / style ? Do tests exist for NNs to see their best move (in max depth / time) in a FEN position ? Maybe my thoughts are too wild for you :-)

from winter.

rosenthj commented on August 17, 2024

Once i read that self-play games are not optimal to train an NN ..

I am not familiar with this.

why not play against a set of other engines ? Their style may differ, that could be an advantage !?

It might be better. There are a few games against Cheng in older parts of the dataset. There are a couple reasons why I am not generally doing it. TCEC has originality constraints and I have some further idealistic views on them. One of the bigger ones is purely practical: If I rely on games against other engines, I have to keep finding and adding other engines around Winters level.

Relying on self play against prior Winter versions means I have a steady pool of opponents and can do some light regression testing. At the moment, v1.05 is actually doing reasonably well against v1.09. That one has a bit of a different network architecture, which may be why it is performing better than some of the later versions in this direct matchup.

And what about the classic eval : does Winter still use this and consider both evals to decide a best move ? How ?

Not at this time. I would like to try adding some features as network inputs, as was the case in previous Winter releases. One of the main issues to solve there is that I want to allow other people to train Winter nets, without the complicated reliance on Winter binaries that existed previously.

But when an engine prefers to answer 1.e4 with 1...e6 (the French Defence), you could train an NN with French games of the known best replies like Exchange variation, Advance variation, Tarrasch variation, Winawer etc. , all having their structures and ideas .. what's your opinion on this reasoning ?

The hope is that structures that occur from the regular start position are not unique to the regular start position. On the other hand DFRC games definitely have some structures which are not common in regular chess. This means Winter is probably a bit weaker overall in regular chess than it could be, but in some positions it will be better.

Btw. I recall engines are not allowed to use an opening book at tournaments. Indeed, when letting engines play in CuteChess i always disable the books, because i think it's not fair.

That is generally correct. Large neural networks like those in Leela can to some degree memorize openings as positions are repeatedly encountered in games and thus end up in the training dataset. Such "opening books" cannot be removed without altering the training data, as I am doing in Winter.

what about consulting several (small) NNs and decide a best move by combining / comparing the outcomes

What advantage would that have over a single larger network?

and then maybe also judging the classic eval ..

Yes, I think that is an avenue that may be worth exploring.

did you ever experiment with that ?

It can be argued I did that with the mixture models I used in Winter versions before I switched to Neural networks in 2019.

I always wonder : how can we distinguish their strength / style ? Do tests exist for NNs to see their best move (in max depth / time) in a FEN position ? Maybe my thoughts are too wild for you :-)

I am not an expert on strength / style. There are more knowledgeable people over at the computer chess club.

I am not exactly sure what you are asking regarding the "test for NNs to see their best move". There are tons of tests datasets of positions where engines are tested on finding the best move in some amount of time.

from winter.

tissatussa commented on August 17, 2024

test for NNs to see their best move

i mean, i can imagine a (web)interface to input a FEN and choose eg. 3 NNs to see which best move they show : in many positions several moves are OK, their eval just slightly differs, but their style can be different : agressive tending to sacrifice material, or defensive tending to closed positions, etc. It might be interesting and fun to see how different engines / NNs approach a certain (puzzle) position. I worked with Test Suits .epd with their 'bm' and 'am' solutions.

from winter.

tissatussa commented on August 17, 2024

Another idea : when several moves have almost equal eval (eg. using MPV) then choose the one which results in the "most harmonious" position .. this may be vague, because how to determine harmony ? Nevermind ..

from winter.

many trained NNs ? about winter HOT 6 OPEN

Comments (6)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent