Code Monkey home page Code Monkey logo

stockfish's People

Contributors

31m059 avatar ajithcj avatar ceebo avatar elbertoone avatar fauziakram avatar firefather avatar glinscott avatar joergoster avatar locutus2 avatar lucasart avatar mcostalba avatar miguel-l avatar mjz1977 avatar nightlyking avatar nodchip avatar noobpwnftw avatar pb00068 avatar r-peleg avatar rocky640 avatar sfisgod avatar snicolet avatar sopel97 avatar stefano80 avatar syzygy1 avatar tttak avatar unaiic avatar vizvezdenec avatar vondele avatar voyagerone avatar zamar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stockfish's Issues

Gensfen TB bug

When doing gensfen from nnue-learn, search does not use TB even though it is supplied.

It is caused by TB::Cardinality to be unset, which defaults to 0.

A temporary fix is to call Tablebases::rank_root_move() from inside Learner::search(), which sets TB::Cardinality to the correct value. I'm not sure if this is the way to go.

segfault / assertion during Initialization

Running the master optimized code I get this.

learn command , learn from ./tutu.txt.bin 
base dir        : 
target dir      : ./
loop              : 100
eval_limit        : 32000
save_only_once    : false
no_shuffle        : false
Loss Function     : ELMO_METHOD(WCSC27)
mini-batch size   : 1000000
nn_batch_size     : 1000
nn_options        : 
learning rate     : 1 , 0 , 0
eta_epoch         : 0 , 0
use_draw_games_in_training : 1
use_draw_games_in_validation : 0
skip_duplicated_positions_in_training : 1
scheduling        : newbob with decay = 0.5, 2 trials
discount rate     : 0
reduction_gameply : 1
LAMBDA            : 0
LAMBDA2           : 0.33
LAMBDA_LIMIT      : 32000
mirror_percentage : 50
eval_save_interval  : 250000000 sfens
loss_output_interval: 1000000 sfens
init..
init_training..
Initializing NN training for Features=HalfKP(Friend)[41024->256x2],Network=AffineTransform[1<-32](ClippedReLU[32](AffineTransform[32<-32](ClippedReLU[32](AffineTransform[32<-512](InputSlice[512(0:512)])))))
Erreur de segmentation (core dumped)

Which turns to this when debug activated:

Initializing NN training for Features=HalfKP(Friend)[41024->256x2],Network=AffineTransform[1<-32](ClippedReLU[32](AffineTransform[32<-32](ClippedReLU[32](AffineTransform[32<-512](InputSlice[512(0:512)])))))
stockfish: nnue/evaluate_nnue_learner.cpp:73: void Eval::NNUE::InitializeTraining(double, uint64_t, double, uint64_t, double): Assertion `feature_transformer' failed.

I think this is because an evalFile shall be loaded before starting a learning process ?

  // Load the evaluation function file
  bool load_eval_file(const std::string& evalFile) {

    Initialize();

seems to be the only way

Detail::Initialize(feature_transformer);

is called.

Is it also working for a workflow starting without a net ?

What will be the good UCI inputs in this case ?

Thanks

Crash under linux

I launch Stockfish in a terminal and BEFORE it loads the network (which happens for example with the command "ucinewgame")
i do commands like "go infinite" or "go nodes 100" : the program immediately crashes.
We should put a check on some commands that the network has been loaded.

Load a pgn to generate data with those lines...

Could we have the option to load a pgn (maybe 5 moves per game) so that we can direct the "gensfen" command to generate data using those lines ?
This would speed up the learning process and not waste time with useless lines.

Consider column-major storage in AffineTransform,like in FeatureTransformer

https://hxim.github.io/Stockfish-Evaluation-Guide/?p=nnue tool shows that the activations in hidden layers are also sparse (in the biggest layer, less than 1/4 in the positions I tried). Therefore, switching to column-major (possibly tiled) storage of matrix in AffineTransform class, like in FeatureTransformer (or more complicated) should be a major (yes, word play was intentional) speedup in the "non-SIMD" (really compiler-generated SIMD) case, and relatively simple to implement.
Besides, this format would permit SIMD using scalar instruction (since the entire column is mul'd by the same value).
Also possible that that means the initialization is not optimal and many "neurons" end up not affecting anything.

HalfKP-KK等が正しく動作しない

こんばんは。日本語ですいません。

先日official-stockfishの方でRemove EvalListがありましたが、
以前作成した色々なfeatureをこれに対応させようとしております。

PPは少し難しそうでしたので後回しにして、それ以外は概ねfeatureのコードとしては書けたと思っていました。
https://github.com/tttak/Stockfish/commits/features_20200830

test nnue test_featuresはすべてのfeatureでパスしたのですが、
改修前後でevalコマンドの結果を比べると、いくつかのarchitectureで結果が一致しませんでした。

具体的には、HalfKPE4、HalfKP_PieceCountのようにHalfKPを細分化した単一FeatureSetの場合はOKなのですが、
HalfKP-KK、HalfKP-Mobility-Pawnのような複合FeatureSetの場合がNGでした。

色々調べてみますと、NNUEがofficial-stockfishにマージされた際に
HalfKPを念頭において、それに不必要なコードの一部が削除されてしまったようです。

例えば、nnue_feature_transformer.hでkRefreshTriggersが複数の場合のコードが削除されてしまっておりました。
それを一部復元したのが tttak@49350b3 です。
(以前作成した色々なfeatureはofficial-stockfishにマージ前のコードを元にしていたため、期待どおりに動作していました)

ですので、このrepositoryにあるk-p_256x2-32-32.hやhalfkp-cr-ep_256x2-32-32.hも、おそらく正しく動作しなくなってしまっているように思われます。

上記の複数kRefreshTriggersのコードを元に戻してみましたが、それだけでは足りないようで、evalコマンドの結果がまだ一致しておりません。
これと似たような件で、何か思い当たる箇所等ございましたらご教示頂ければ幸いです。
(自分でコードを調べればいいのですが、もし何かヒントを頂ければと思い、質問させて頂きました次第です)

Gensfen move randomisation as UCI options

It would really help to be able to use options such as random_move_like_apery etc. via UCI. That would allow for generating fens with cutechss-cli which would make it much easier for fishtest-like platforms to distribute game generation and it would further enable us to use the games for other ML projects as well.

Stockfish nnue with learn fails to compile on Ubuntu 18.04 due to a missing declaration

learn/learner.cpp:959:22: error: ‘INT_MIN’ was not declared in this scope
959 | int search_depth2 = INT_MIN;
| ^~~
learn/learner.cpp:87:1: note: ‘INT_MIN’ is defined in header ‘’; did you forget to ‘#include ’?
86 | #include "../nnue/evaluate_nnue_learner.h"
+++ |+#include
87 | #include <shared_mutex>
...
make[1]: *** [: learner.o] Error 1

The fix is to do just what it says; insert a new line in learn/learner.cpp after line 86: #include <climits> (thanks nodchip)

I'm not sure if this is specific to Ubuntu 18.04 and gcc v10.x but probably if we're defining something in header then we should declare it.

Please see if it makes sense to add this declaration to learner.cpp so Ubuntu users so Ubuntu users don't have to always add this fix for the current release and possibly future releases. Thanks in advance.

Change the default values of write_out_draw_game_in_training_data_generation, use_draw_games_in_training and use_draw_games_in_validation

It is believed that setting write_out_draw_game_in_training_data_generation, use_draw_games_in_training and use_draw_games_in_validation to 1 are better than 0. We will change the default values of write_out_draw_game_in_training_data_generation, use_draw_games_in_training and use_draw_games_in_validation. Before we change them, we need the following experiments:

  1. Generate training data with write_out_draw_game_in_training_data_generation = 0.
  2. Train a net with 1., use_draw_games_in_training = 0 and use_draw_games_in_validation = 0.
  3. Generate training data with write_out_draw_game_in_training_data_generation = 1.
  4. Train a net with 3., use_draw_games_in_training = 1 and use_draw_games_in_validation = 1.
  5. Compare the elo between 2. and 4.

Any helps are welcome.

「README.md」と「Readme.md」

こんばんは。
convert_binの件でPull Requestを送らせて頂こうと思っているのですが、
現在、「README.md」と「Readme.md」が存在していまして
Windows環境のgitだとうまく処理できないようです。
おそらく意図的に2つのファイルを作成されたのではないと思うのですが、
どちらか一方を削除して頂くことは可能でしょうか?

Loading the net and doing some stuff after 'isready'.

This is somewhat abusing the UCI protocol.

The command 'isready' is meant for syncronizing the GUI and the engine.
After an 'isready', all the engine must do is to answer with 'readyok'. 'isready' is usually sent after one or more 'setoption' commands, after a 'ucinewgame' (mandatory), and may also be sent whenever the GUI wants to know whether the engine is still 'alive'.

A better solution would be to load the NN after 'ucinewgame' command, where we also initialize some search stuff with Search::clear(). This will also allow to do profile builds.

Trained network file not getting saved in Linux

No files are being created after training phase.

$./stockfish-nnue-learn-use-blas
Stockfish 220620 64 BMI2 by T. Romstad, M. Costalba, J. Kiiski, G. Linscott
setoption name SkipLoadingEval value true
setoption name Threads value 6
setoption name EvalSaveDir value evalsave
isready
readyok
learn targetdir trainingdata loop 100 batchsize 1000000 eta 1.0 lambda 0.5 eval_limit 32000 nn_batch_size 1000 newbob_decay 0.5 eval_save_interval 10000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name validationdata/generated_kifu.bin
[...]
finalize..all threads are joined.
Check Sum = 0
save_eval() start. folder = evalsave/final
save_eval() finished. folder = evalsave/final
quit

Correct folder structure exists:

./trainingdata/generated_kifu.bin
./validationdata/generated_kifu.bin
./evalsave

Also tried
setoption name EvalSaveDir value .
setoption name EvalSaveDir value ./evalsave
setoption name EvalSaveDir value evalsave/nn.bin
setoption name EvalSaveDir value /complete/path/evalsave
and training with executablestockfish-nnue-learn (without blas).

The same commands with copied folder structure and files on a Windows system worked fine and a network file was created as expected in folder evalsave.

Linux build was compiled with bmi2 architecture and c++14 replaced with c++17 and LDFLAGS += -static commented out on two occasions in Makefile.

Decoupling gensfen/learn/conver declarations.

I want to look tomorrow, among other things, into splitting learn.h and also remove locally made declarations for various stuff (like Learner namespace in uci.cpp). Now the last issue is the use_raw_nnue_eval option that's used for gensfen and learn is in an ugly state of being a global flag that can only be set through two commands. What is the consensus on the way we should handle use_raw_nnue_eval? I think we already decided that for global switches UCI options are the best, now should this be done as it's done in cfish (3 state Use NUUE) or should there be a dedicated option that takes higher priority over Use NNUE?

Write reference documents for commands and options

We could need the reference documents for commands and options. The contents will be the list of commands in Stockfish NNUE, and their options. Each option will need the description of name, value type, effects, available values and the default value.

Crash when no net can be loaded but SkipLoadingEval=false.

While trying to reproduce a certain crash it happened more than once that I forgot to set SkipLoadingEval=true. I think the correct way of dealing with this is to report an error and gracefully exit instead of just closing and confusing the user.

Allow "gensfen" to use fixed nodes

So far gensfen only allows us to use a fixed depth which results in very weak endgames compared to the early and midgame. 2 million nodes (which is about 1 second on 1 core) reaches about depth 20 from the starting position and in endgames over 40 at times. If we limit by depth we cut a lot of nodes because in the endgame a lot of lines can be pruned and thus we reach higher depth much more quickly.
Overall, using nodes instead of depth shouldn't slow us down since SF gains a lot of speed in the endgame as well.

learn exe don't use all training data even without rejet and loop 1

Hi Nodchip,

I don't find the good settings for eval_save_interval and loss_output_interval to use all the generated sfens at training phase.

Some examples with 1b sfens in a training .bin file :

  • learn "256x2" loop 1 + eval_save_interval 250m + loss_output_interval 50m => writes several nn.bin in evalsave subdirectories until 700m sfens then continues to calculate until 950m then i see the "end of file" mention and neither LOSS value nor new nn.bin in evalsave subdirectories or evalsave/final

  • learn "384x2" loop 1 + eval_save_interval 200m + loss_output_interval 50m => writes several nn.bin in evalsave subdirectories until 750m sfens then continues to calculate until 950m then i see the "end of file" mention and neither LOSS value nor new nn.bin in evalsave subdirectories or evalsave/final

No help at discord channel...

Build error

The last code doesn't build with the latest MinGW 8.1.0 on Windows.
The problem is the line
#include
in main.cpp

Issues with training a new net on the simplified formula.

Seems like I can't get the latest binaries working right with my older data. I've tried adjusting eta from 1.0 to 0.1, I've adjusted lambda from 1.0 to 0.5 and 0.3. nn_batch_size from 1000 to 10000.
Loss stays high and nets play horribly.

I think Additional_pylon has the same issues

JJosh luckily seems to be progressing with the simplified formula though.

Below is an example command I've used:

learn targetdir train loop 100 batchsize 1000000 use_draw_in_training 1 use_draw_in_validation 1 eta 0.1 lambda 0.3 eval_limit 32000 nn_batch_size 1000 newbob_decay 0.5 eval_save_interval 250000000 loss_output_interval 1000000 mirror_percentage 50 validation_set_file_name val\vald10.bin

The training data I've used is the same with which I managed to create my first strong net.

Anyone having similar issues?
I also still have to test "stockfish-nnue-2020-06-27-1" binaries.

Wrong release comment

The 0719 release says to find a network on #sf-nnue-dev channel, but the channel doesn't exist.

Final net not being saved

There are reports at discord that the best net after the training has finished does not get saved into the final folder.

In bool LearnerThink::save() in learner.cpp, we pass the directory with the best NN for restoring before saving into the final folder.

					cout << "restoring parameters from " << best_nn_directory << endl;
					Eval::NNUE::RestoreParameters(best_nn_directory);

However, it looks like this string doesn't get used in RestoreParameters() in evaluate_nnue_learner.cpp.

// Reread the evaluation function parameters for learning from the file
void RestoreParameters(const std::string& dir_name) {
  const std::string file_name = NNUE::fileName;
  std::ifstream stream(file_name, std::ios::binary);
  bool result = ReadParameters(stream);
  assert(result);

  SendMessages({{"reset"}});
}

Do we need to change file_name into
const std::string file_name = Path::Combine(dir_name, NNUE::savedfileName); ?

This probably needs to be tested before getting applied, though.

Assertion fail on reading Option "EvalDir"

There's no option "EvalDir". It's read near the end of learner.cpp and casted to std::string. ucioption doesn't know this option -> doesn't know it's string -> fails assertion in Option::operator std::string.

Is there a guide on how to learn from PGNs?

I tried pgn_to_plain and pgn-extract path and got quite confused. My PGN is like this:

[Date "2020.07.14"]
[Round "?"]
[White "Stockfish.bmi2.halfkp_256x2-32-32.nnue"]
[Black "Stockfish.bmi2.halfkp_256x2-32-32.nnue"]
[Result "1-0"]
[BlackElo "2000"]
[Time "22:00:30"]
[WhiteElo "2000"]
[TimeControl "300+0"]
[Termination "normal"]
[PlyCount "43"]
[WhiteType "program"]
[BlackType "program"]

1. f4 d5 2. Nh3 Bxh3 3. gxh3 e6 4. e3 Qd6 5. Bg2 f5 6. d3 Nf6 7. O-O Nbd7
8. Nc3 Be7 9. Kh1 Rg8 10. a4 g5 11. e4 dxe4 12. fxg5 Nd5 13. dxe4 Nxc3 14.
Qh5+ Kd8 15. Qxh7 Rf8 16. bxc3 f4 17. Ba3 Nc5 18. Rad1 b6 19. e5 Qxd1 20.
Rxd1+ Kc8 21. Qxe7 Rf5 22. Qe8# 1-0

I think it's well formed to be converted to training data.

Two kings on same Square

Data generated from native stockfish from this repository.

Board looks like:
r . . . r . k .
p p . . . p . p
. . p . n . p .
. . . . . . . n
P . . P P . . P
. b . . . . P .
P B . . . P . .
. . . R R . K .
Result 0.0
Kings on 6 and 62.

Next position saved both kings are on square 63 and everything crashes, for next 40 ply.

Next position I can parse is:
. . B . . . . k
. . . . . . . .
. . . . . . p .
. . . . . p . .
P R . . . . n .
. . . . . . P .
r . . . . . . .
. . . . . . . K
Result 0.5

I guess this must be a different game because result is different? But it didn't start at beginning but here so maybe this is another bug?

Then many many positions are fine and later on in same file I have both kings on square 63 again...

I have generated many other files successfully with no issues, but this one has a lot of problems...

Nonsense PV blundering pieces at reasonable time control?

https://www.chess.com/computer-chess-championship#event=postccc14-lc0-cpu-vs-stockfishnnue&game=67

(pvs can also be obtained from this tournament pgn: https://cccfiles.chess.com/archive/tournament-143593.pgn and it's game 67 out of them)

In two positions on black's 25th move and after it, stockfish nnue suggests strange moves in its PV before and after it makes a move.

Black first suggests 25... Kg7, not the move ultimately played. There are some significant blunders already in this PV, but perhaps of less stark concern than the PV that shows after the move played is made.

2r2k1r/4bp2/3p3p/3RpPp1/ppq1P1P1/3QB3/PPP4P/1K4R1 b - - 2 25
image

After 25... Qxd3 is played, the PV shows 26. b3 as the reply and allowing the queen to be captured the very next move with 26... Bf6, a completely empty threat that does nothing, still leaving the queen hanging.

2r2k1r/4bp2/3p3p/3RpPp1/pp2P1P1/3qB3/PPP4P/1K4R1 w - - 0 26
image

There were no significant dips in node counts/depth during the game. This is the first and only time i've noticed a potential bug of huge significant blunders in reporting PVs, although the PVs seem to change very rapidly due to the volatility of NNUE evaluations and search's ability to resolve some while finding other places become equally volatile. Although, I do fear there could be another reason for this.

Note NNUE did win this game, but it the faulty PV reporting makes me suspect there is a bug lurking somewhere.

Compression of Feature transofrmer to 1/3 of size

I figured out how to efficiently compress FeatureTransformer weights in halfkp_256x2-32-32.
It is exploiting fact that weights for pieces and for kings on neighbouring squares has similar weights.
Then it encodes differences with simple variable-length code.
nn-97f742aaefcd.nnue is compressed from 21022697 bytes to 6749336 bytes with this method.
If someone thinks this is useful for something demo html page here:
https://hxim.github.io/Stockfish-Evaluation-Guide/compress_nnue_demo.html
And source code here:
https://github.com/hxim/Stockfish-Evaluation-Guide/blob/master/compress_nnue_demo.html

crash in gensfen

inp:

uci
setoption name Use NNUE value false
isready
gensfen depth 6 loop 10000000 use_draw_in_training_data_generation 1 eval_limit 32000 output_file_name training_data\training_data.bin use_raw_nnue_eval 0
quit

results in SIGSEGV

#0  0x00005555555d02b7 in Eval::NNUE::FeatureTransformer::UpdateAccumulator (this=0x7fff9ebf7040, pos=...) at nnue/nnue_feature_transformer.h:308
#1  0x00005555555cf991 in Eval::NNUE::FeatureTransformer::UpdateAccumulatorIfPossible (this=0x7fff9ebf7040, pos=...) at nnue/nnue_feature_transformer.h:89
#2  0x00005555555cf9f7 in Eval::NNUE::FeatureTransformer::Transform (this=0x7fff9ebf7040, pos=..., output=0x7fff7ebdd8c0 "", refresh=false) at nnue/nnue_feature_transformer.h:97
#3  0x00005555555cf284 in Eval::NNUE::ComputeScore (pos=..., refresh=false) at nnue/evaluate_nnue.cpp:171
#4  0x00005555555cf489 in Eval::NNUE::evaluate (pos=...) at nnue/evaluate_nnue.cpp:204
#5  0x0000555555572136 in Eval::evaluate (pos=...) at evaluate.cpp:952
#6  0x000055555559cf93 in (anonymous namespace)::qsearch<(<unnamed>::NodeType)1>(Position &, Search::Stack *, Value, Value, Depth) (pos=..., ss=0x7fff7ebe1fd0, alpha=-32001, beta=VALUE_INFINITE, 
    depth=0) at search.cpp:1487
#7  0x000055555559a38b in (anonymous namespace)::search<(<unnamed>::NodeType)1>(Position &, Search::Stack *, Value, Value, Depth, bool) (pos=..., ss=0x7fff7ebe1fd0, alpha=-32001, beta=VALUE_INFINITE, 
    depth=0, cutNode=false) at search.cpp:590
#8  0x000055555559c6b9 in (anonymous namespace)::search<(<unnamed>::NodeType)1>(Position &, Search::Stack *, Value, Value, Depth, bool) (pos=..., ss=0x7fff7ebe1f98, alpha=-32001, beta=VALUE_INFINITE, 
    depth=1, cutNode=false) at search.cpp:1285
#9  0x0000555555599c88 in Learner::search (pos=..., depth_=6, multiPV=1, nodesLimit=0) at search.cpp:2231
#10 0x00005555556015d1 in Learner::MultiThinkGenSfen::thread_worker (this=0x7fffffffce50, thread_id=0) at learn/gensfen.cpp:833
#11 0x00005555556301b2 in MultiThink::<lambda()>::operator()(void) const (__closure=0x555555831088) at learn/multi_think.cpp:49
#12 0x0000555555630c70 in std::__invoke_impl<void, MultiThink::go_think()::<lambda()> >(std::__invoke_other, MultiThink::<lambda()> &&) (__f=...) at /usr/include/c++/9/bits/invoke.h:60
#13 0x0000555555630c25 in std::__invoke<MultiThink::go_think()::<lambda()> >(MultiThink::<lambda()> &&) (__fn=...) at /usr/include/c++/9/bits/invoke.h:95
#14 0x0000555555630bd2 in std::thread::_Invoker<std::tuple<MultiThink::go_think()::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x555555831088) at /usr/include/c++/9/thread:244
#15 0x0000555555630ba8 in std::thread::_Invoker<std::tuple<MultiThink::go_think()::<lambda()> > >::operator()(void) (this=0x555555831088) at /usr/include/c++/9/thread:251
#16 0x0000555555630b8c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<MultiThink::go_think()::<lambda()> > > >::_M_run(void) (this=0x555555831080) at /usr/include/c++/9/thread:195
#17 0x00007ffff5ce5cb4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007ffff5df9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x00007ffff5991103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

runs fine with 'Use NNUE=true'

openblas and other backends

More a suggestion than an issue. I'm aware of a high performance library that is specifically targeted to perform small matrix multiplications and convolutions, which is high optimized for the tasks I expect are being done in this fork:

https://github.com/hfp/libxsmm

This almost certainly outperforms the openblas parts of the code. Obviously i don't know how it compares to the hand-coded intrinsics code.

Training generator generates same training data each run (Not random)

The PRNG is seeded with a constant. This causes the .bin files to be identical at the same depth for consecutive runs. Many like Sergio are generating data on multiple machines. You can verify this problem by setting Threads to 1 so that Stockfish is deterministic. Generate training bin file and rename. Repeat and binary file compare the two runs. They will be identical.

The fix is to modify multi_think.h about line 19 with

    std::random_device seed_it;
    MultiThink() : prng(seed_it()) //21120903

There is a second constant seed in nnue_test_command.cpp at around line 32 that should be modified the same way. Both jjosh and Sergio are implementing this change (Sergio is generating on multiple machines). Nets created prior to this change are likely over training the same positions.

Regards,
Mike

Significant interleaving of positions in gensfen with more than 1 thread.

While this is not an issue in itself it causes problems for certain training data compression software. I believe this is the right place to at least have a discussion on this topic.

Due to the nature of the training data most generators produce a list of entries that follow lines in actual chess games. This can be exploited for compression by storing position deltas whenever possible (for example in a form of a move between neighbouring positions) and the gains are in the order of 10-15x smaller files. I believe that good compression of training data will become crucial in the near future as the amount of data generated and exchanged will be increasing.

Current behaviour of gensfen uses the same output file for each thread, and effectively interleaves the generated entries such that they no longer produce chains of positions that can be well compressed. One possible solution would be on the compression software side and would involve scanning entries in a sliding window in search for sequences that correspond to single games. I deem this solution unsound for numerous reasons:

  1. It works around a very narrow (but due to the popularity of gensfen very pronounced) issue. This problem is not present when converting annotated games nor when generating with one thread.
  2. It provides no benefit for naturally unstructured training data such as entries generated from opening books.
  3. It significantly complicates the implementation of compression software and incurs performance overhead.
  4. It is strictly worse than solving it here as the window length would have to be limited in practice so not all sequences would be found. Moreover it would introduce a tradeoff between compression speed and compression ratio (depending on window size) while not providing better compression than is possible with the right input data.

For the above reasons I would propose to improve the matter its core - here. Right now I see 2 potential solutions:

  1. Use one file per thread. This should be straighforward and would even remove many critical sections.
  2. Ensure that only full games (or at least large buffer chunks) are persisted at once.

Use std::filesystem to create directories

misc.cpp contains a utility function to create a directory on several environments. We could replace the utility function to std::filesystem for portability and maintainability.

Training Formula

return sigmoid(value / PawnValueEg / 4.0 * log(10.0));

This function does not seem to correspond to actual Stockfish games, since by my calculations this has 900cp like a 90% winrate when SF pretty much always wins once it hits +500, or else someone is filing a bug ticket.

Also it would be nice to have a way to use actual winrate in training data like from lc0 or once SF and similar NNUE engines have a winrate output option. It should be much more accurate to use a real winrate instead of guessing CP conversions and back again.

Enable PGO optimization for both eval for profile-learn

I recently added a new makefile target: profile-learn that enable pgo for the gensfen command #86 and gives a nice speed up
However, it will only optimize gensfen for the default eval and not both
This is because the command used during pgo is gensfen depth 3 loop 100000 and use the default eval only
Ideally one would want a compile optimized for both eval.

convert_bin made crash learn exe

if i try to convert my plain text file into a training bin file with this command :

stockfish.bmi2.halfkp_256x2-32-32.nnue-learn.2020-07-19.exe
Stockfish+NNUE 190720 64 BMI2 by T. Romstad, M. Costalba, J. Kiiski, G. Linscott, H. Noda, Y. Nasu, M. Isozaki
learn convert_bin output_file_name reinforcing.bin reinforcing.txt
learn command , learn from reinforcing.txt ,
base dir :
target dir :
Error! ./eval/nn.bin not found or wrong format
convert_bin..
convert reinforcing.txt ...

=> the exe crashes and the renforcing.bin file weigths 0 bytes...

=> but if i prepare things with the conversion of a little sample of pre-generated positions (by gensfen) and only after that i retry to convert my plain text file into a training bin file, it works :

stockfish.bmi2.halfkp_256x2-32-32.nnue-learn.2020-07-19.exe
Stockfish+NNUE 190720 64 BMI2 by T. Romstad, M. Costalba, J. Kiiski, G. Linscott, H. Noda, Y. Nasu, M. Isozaki
learn convert_bin output_file_name init.bin init.txt
learn command , learn from init.txt ,
base dir :
target dir :
Error! ./eval/nn.bin not found or wrong format
convert_bin..
convert init.txt ... done10 parsed 0 is filtered
all done
learn convert_bin output_file_name reinforcing.bin reinforcing.txt
learn command , learn from reinforcing.txt ,
base dir :
target dir :
convert_bin..
convert reinforcing.txt ... done1427786 parsed 0 is filtered
all done

=> and my training.bin file weigths 50MB !

this bug occur on several computer here

Change the default value of detect_draw_by_insufficient_mating_material

detect_draw_by_insufficient_mating_material option detects draw earlier. We will change the default values of detect_draw_by_insufficient_mating_material. Before we change them, we need the following experiments:

  1. Generate training data with detect_draw_by_insufficient_mating_material = 0.
  2. Train a net with 1.
  3. Generate training data with detect_draw_by_insufficient_mating_material = 1.
  4. Train a net with 3.
  5. Compare the elo between 2. and 4.

Any helps are welcome.

Potential null PackedSfenValue entries in gensfen output.

I have received one big .bin file (~125M entries) with about 80 entries (~uniformly distributed) being just null bytes. Such entries crash learning. Initially I thought that this might be a shuffle issue, but since it is not present in any other shuffled files it is most likely somehow coming from gensfen. I've been looking for a potential source of this but the only place where such entries arise is https://github.com/nodchip/Stockfish/blob/master/src/learn/gensfen.cpp#L933 and they are filled in functions that cannot fail. Since the file was shuffled I have no way of knowing if these entries were consecutive or happened spuriously. I'm debating a temporary defensive solution that would involve removing null entries just before writing to disk. If anyone has any idea how these could arise it would be great.

Options["Training"] is read before it can be assigned through uci if Threads option precedes Training

Found by linox. I'm posting this verbatim from discord.

There is a bug in the use of option Options["Training"]. Default value is false, when user sends the command
setoption name training value true it will not set to true because it is called under Search::init() at

training = Options["Training"];

To fix this issue the training = Options["Training"]; can be called under void MainThread::search() at
void MainThread::search() {

Halfmove count wraps at 63

To reproduce import fens using "learn convert_bin" then rename the output file and reverse with "learn convert_plain" Check the output file. Look at fens that had a halfmove count greater than 63. 64 becomes 0, 65 becomes 1, etc.

Looks like an overflow.

Adding this issue to document the behavior.

Towards a better training data storage format.

There are defects in the current .bin format that cannot be addressed without breaking backwards compatibility.

For example:
#75 (This is actually regarding 50-move rule)
Similarily the half-move counter wraps after 255.

This could be addressed with simple changes and a new, say, ".bin2" format. But if we're going to lengths to have a better format then we could as well improve it in other ways while we have an occasion to make breaking changes. I have created a format that doesn't have the abovementioned issues and also allows for much smaller files (around 15x smaller than .bin) [provided that concerns from #91 get addressed]. Therefore I come with a proposition of including the ability to use that format for serialization and deserialization. To make this change as simple as possible I could create a header only library that provides a writer and a reader class with simple API similar to fstream and with a generic interface that can be easly called with stockfish's chess primitives. I would like to hear your thoughts on this and if I should pursue this.

I want this issue to also serve as a place to report further fundamental issues with the .bin format that cannot be addressed in a backwards compatibile way and other potential ways of improving upon it in a second iteration.

Naming inherited from shogi. Unification of nomenclature.

The most common term that irks me is "sfen" that, I presume, means "Shogi FEN", which doesn't make sense not only in chess circles, but also in the way it's commonly used to describe a training data entry. (Even something as mindless as Training Data Entry (TDE) would be less confusing in my opinion.)

I've also seen a few comments in the code that refer to UCI as USI.

Should we try pursuing changing names that currently have no relation with ones used in chess? Looking at this from future perspective it would be weird if the only reason why something is named as it is would be that "it originated from shogi".

What other names in the codebase are currently borrowed? What could they be renamed to?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.