neurobenzene: An improved benzene project for playing and solving Hex with the help of deep neural networks

This reposotory is built upon benzene-vanilla-cmake.

Prerequisites

Dependent libraries from vanilla benzene:

boost, berkeley-db.

On Ubuntu, install via

sudo apt-get install libboost-all-dev 
sudo apt-get install libdb-dev

On Mac, use brew instead.

As November 25th 2018, this repository supports C API from TensorFlow 1.12, therefore it is unnecessary to compile TensorFlow from source anymore!

But, you need to setup C API first, by the following instructions:

GPU Only supported on Linux. Sepcifically, pre-released Tensorflow C API 1.12 requires CUDA-9.0 and CUDNN 7.1.4, make sure you have them in your system.
```
cd tensorflow_c_gpu/
source setup_tf.sh
```
CPU (Mac or Linux)
```
cd tensorflow_c_cpu/
source setup_tf.sh
```

At tensorflow_c_cpu or tensorflow_c_gpu, there is test source code hello_tf.c, you may try sh gcc hello_tf.c -o test_hello -ltensorflow -ltensorflow_framework ./test_hello To verify that your TensorFlow C API have been correctly set up. If so, it should display

Hello from TensorFlow C library version 1.12.0

How to build?

After everything is ready, build the project by:

$mkdir build
$cd build
$cmake ../
$make

On Mac, there might be linking error to db or boost, in such case, you may revise your ~/.bash_profile (Mac OS), e.g.,

# berkeley-db
export CPATH=$CPATH:/usr/local/Cellar/berkeley-db/6.2.23/include
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/usr/local/Cellar/berkeley-db/6.2.23/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Cellar/berkeley-db/6.2.23/lib

#boost
export CPATH=$CPATH:/usr/local/Cellar/boost/1.64.0_1/include
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/usr/local/Cellar/boost/1.64.0_1/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Cellar/boost/1.64.0_1/lib

Running some tests

Try ./build/src/mohex/mohex3HNN

 showboard
= 
 
  2a78b48864b73c1f
  a  b  c  d  e  f  g  h  i  j  k  
 1\.  .  .  .  .  .  .  .  .  .  .\1
  2\.  .  .  .  .  .  .  .  .  .  .\2
   3\.  .  .  .  .  .  .  .  .  .  .\3
    4\.  .  .  .  .  .  .  .  .  .  .\4
     5\.  .  .  .  .  .  .  .  .  .  .\5
      6\.  .  .  .  .  .  .  .  .  .  .\6
       7\.  .  .  .  .  .  .  .  .  .  .\7
        8\.  .  .  .  .  .  .  .  .  .  .\8
         9\.  .  .  .  .  .  .  .  .  .  .\9
         10\.  .  .  .  .  .  .  .  .  .  .\10
          11\.  .  .  .  .  .  .  .  .  .  .\11
              a  b  c  d  e  f  g  h  i  j  k  

nn_ls
= Loaded neural net: /home/chenyang/cg/neurobenzene/share/nn/11x11train4.txt.out-post.ckpt-1.const.pb
Use nn_load nn_model for new model, note nn_model should be a full path or just model name if it is available in share/nn/
List available nn models at share/nn/
13x13train30.txt.out-post.ckpt-2.const.pb
11x11train4.txt.out-post.ckpt-1.const.pb

play b g3
= 

showboard
= 
 
  9af481a52c1456bb
  a  b  c  d  e  f  g  h  i  j  k  
 1\.  .  .  .  .  .  .  .  .  .  .\1
  2\.  .  .  .  .  .  .  .  .  .  .\2
   3\.  .  .  .  .  .  B  .  .  .  .\3
    4\.  .  .  .  .  .  .  .  .  .  .\4
     5\.  .  .  .  .  .  .  .  .  .  .\5
      6\.  .  .  .  .  .  .  .  .  .  .\6
       7\.  .  .  .  .  .  .  .  .  .  .\7
        8\.  .  .  .  .  .  .  .  .  .  .\8
         9\.  .  .  .  .  .  .  .  .  .  .\9
         10\.  .  .  .  .  .  .  .  .  .  .\10
          11\.  .  .  .  .  .  .  .  .  .  .\11
              a  b  c  d  e  f  g  h  i  j  k  

genmove w
63079 info: Best move cannot be determined, must search state.
63079 info: --- MoHexPlayer::Search() ---
63079 info: Color: white
63079 info: MaxGames: 99999999
63079 info: NumberThreads: 1
63079 info: MaxNodes: 10416666 (999999936 bytes)
63079 info: TimeForMove: 10
63079 info: Time for PreSearch: 0.387516s
63079 info: 
  9af481a52c1456bb
  a  b  c  d  e  f  g  h  i  j  k  
 1\.  .  .  *  *  *  *  *  *  *  .\1
  2\.  .  *  *  *  *  *  *  *  *  .\2
   3\.  *  *  *  *  *  B  *  *  *  .\3
    4\.  *  *  *  *  *  *  *  *  *  .\4
     5\.  *  *  *  *  *  *  *  .  *  .\5
      6\.  *  *  *  *  *  *  *  *  *  .\6
       7\.  *  .  *  *  *  *  *  *  *  .\7
        8\.  *  *  *  *  *  *  *  *  *  .\8
         9\.  *  *  *  *  *  *  *  *  *  .\9
         10\.  *  *  *  *  *  *  *  *  .  .\10
          11\.  *  *  *  *  *  *  *  .  .  .\11
              a  b  c  d  e  f  g  h  i  j  k  
63079 info: no subtree to reuse since old position not exists
63079 info: No subtree to reuse.
63079 info: Creating thread 0
63079 info: StartSearch()[0]
 0:05 | 0.424 | 8110 | 3.4 | e7 g6 g5 ^ f6 f5 e6 e5 i4
SgUctSearch: move cannot change anymore
63079 info: 
Elapsed Time   9.35722s
Count          14802
GamesPlayed    14802
Nodes          61741
Knowledge      35 (0.2%)
Expansions     656.0
Time           8.80
GameLen        120.0 dev=0.0 min=120.0 max=120.0
InTree         3.6 dev=1.6 min=0.0 max=11.0
KnowDepth      2.0 dev=1.0 min=0.0 max=4.0
Aborted        0%
Games/s        1681.9
Score          0.42
Sequence       e7 g6 g5 f6 f5 e6 e5 i4 h5
moveselect dithering threshold: 0	 num stones in current state:1
63079 info: 
= e7

quit
=

Train the neural net

In subdir simplefeature3, there is a relatively independent Python project used to train the neural net.

Commands and Parameters

Newly added commands:

nn_load <path_to_nn_model> # if no parameter followed, display the loaded model
nn_evaluate #evaluate the current board state by a forward pass

param_mohex moveselect_ditherthreshold <int> 
# softmax selection according to #count when the number of stones is less than threshold
param_mohex use_playout_const <float, [0,1]> # how to combine playout result

Display default parameters:

param_mohex
= 
[bool] backup_ice_info 1
[bool] extend_unstable_search 1
[bool] lock_free 0
[bool] lazy_delete 1
[bool] perform_pre_search 1
[bool] prior_pruning 1
[bool] ponder 0
[bool] reuse_subtree 1
[bool] search_singleton 0
[bool] use_livegfx 0
[bool] use_parallel_solver 0
[bool] use_rave 1
[bool] use_root_data 1
[bool] use_time_management 0
[bool] weight_rave_updates 0
[bool] virtual_loss 1
[string] bias_term 0
[string] expand_threshold 10
[string] fillin_map_bits 16
[string] first_play_urgency 0.35
[string] playout_global_gamma_cap 0.157
[string] knowledge_threshold "256"
[string] number_playouts_per_visit 1
[string] max_games 99999999
[string] max_memory 1999999872
[string] max_nodes 10416666
[string] max_time 10
[string] move_select count
[string] num_threads 1
[string] progressive_bias 2.47
[string] vcm_gamma 282
[string] randomize_rave_frequency 30
[string] rave_weight_final 830
[string] rave_weight_initial 2.12
[string] time_control_override -1
[string] uct_bias_constant 0.22
[string] moveselect_ditherthreshold 0
[string] use_playout_const 0

new commands

nn_load 
#if no argument given, display loaded nn, otherwise load the nn either using 1) model name in share/nn/ if path is relative
or 2) absolute path to nn model

nn_ls #list available nn models at share/nn/

nn_evaluate # call single nn pass, display p,q,v values.

mohex-self-play n filetosave 
#mohex-self-play n games, then save the games to file

param_mohex root_dirichlet_prior 0.06
#arg is a flot number, if 0 means no dirichlet at all; 
#if root_dirichlet_prior is non-zero, then 
# p(s,a) = 0.75 *p(s,a) + 0.25*nosie, where noise is sampled from Dir(root_dirichlet_prior)

MoHex-self-play game format

When one mohex-self-play command is finished, played games are saved offline with format:

Each line is one game
A game consists of a sequence of black and white moves, where black starts first
Each move follows a probability distribution from which the move was drawn
At the end of each line lies the game value, w.r.t the player to play after the last move

One example game:

B[b3] W[j4][j4:0.37;i5:0.06;h6:0.02;g7:0.05;f8:0.12;e9:0.12;d10:0.22;] B[i5][k4:0.08;i5:0.85;] W[i4][k2:0.01;l2:0.04;j3:0.01;k3:0.02;l3:0.25;i4:0.30;k4:0.10;j5:0.04;k5:0.15;k6:0.05;] B[h5][h5:0.95;] W[h4][h4:0.95;] B[g5][f5:0.02;g5:0.95;] W[g4][g4:0.98;] B[f5][e5:0.09;f5:0.89;] W[f4][f4:0.98;] B[e5][d5:0.35;e5:0.63;] W[e4][e4:0.96;] B[d5][c5:0.17;d5:0.79;] W[d4][d4:0.95;] B[c5][b5:0.19;c5:0.80;] W[c4][c4:0.96;] B[a5][a5:0.97;] W[b5][b5:0.97;] B[a6][a6:0.97;] W[b6][b6:0.97;] B[a7][a7:0.98;] W[b7][b7:0.96;] B[a8][a8:0.98;] W[b8][b8:0.98;] B[a9][a9:0.98;] W[b9][b9:0.97;] B[a10][a10:0.98;] W[b10][b10:0.91;b11:0.04;b12:0.03;] B[a11][a11:0.97;] W[b12][b11:0.02;b12:0.98;] B[b11][c10:0.14;b11:0.83;] W[c11][c11:0.54;d11:0.42;] B[c10][c10:0.95;] W[d10][d10:0.67;e10:0.19;f10:0.05;] B[d8][d8:0.68;d9:0.08;f9:0.03;e10:0.17;] W[d9][c2:0.01;g8:0.02;h8:0.07;d9:0.67;f9:0.03;g9:0.02;h9:0.03;f10:0.06;g10:0.06;] B[c9][c9:0.97;] W[h8][g8:0.04;h8:0.25;i8:0.12;h9:0.21;f10:0.22;g10:0.04;e11:0.01;f11:0.06;] B[i8][j7:0.02;g8:0.02;i8:0.91;f9:0.01;] W[i7][i7:0.73;h9:0.09;i9:0.14;] B[j7][k6:0.16;j7:0.71;g8:0.01;f9:0.08;] W[h9][j6:0.39;h9:0.57;] B[e10][j6:0.26;i9:0.23;j9:0.06;e10:0.42;] W[d12][c2:0.13;e8:0.02;f8:0.30;e11:0.10;d12:0.41;] B[j6][j6:0.91;i9:0.03;f11:0.03;] W[i9][i9:0.66;i10:0.30;] B[j9][k8:0.10;j9:0.85;f11:0.02;] W[j8][j8:0.96;] B[l7][l7:0.98;] W[k7][j5:0.01;k7:0.81;k8:0.04;f11:0.12;] B[l6][k6:0.04;l6:0.92;] W[k6][k2:0.05;l3:0.05;k4:0.05;k6:0.77;k8:0.04;f11:0.01;] B[l5][l5:0.96;] W[k8][j5:0.05;k5:0.06;k8:0.62;f11:0.23;] B[l8][l8:0.96;] W[k9][j5:0.09;k5:0.10;k9:0.75;f11:0.03;] B[l9][l9:0.95;k11:0.01;] W[k11][j5:0.04;k5:0.08;k10:0.13;k11:0.72;] B[k10][k10:0.92;f11:0.05;k12:0.01;] W[j11][j5:0.01;k5:0.10;j11:0.86;] B[j10][j10:0.96;k12:0.01;] W[j5][j5:0.57;k5:0.11;i11:0.29;] B[k5][k5:0.96;] W[i6][i6:0.96;i11:0.01;] B[l3][l3:0.51;f11:0.19;h11:0.21;i11:0.08;] W[i11][l2:0.06;i11:0.93;] B[k12][f8:0.10;m10:0.20;e11:0.15;f11:0.11;g11:0.09;h11:0.11;k12:0.22;] W[l10][l10:1.00;] B[h11][f8:0.04;g10:0.03;e11:0.15;f11:0.27;g11:0.19;h11:0.31;] W[g12][c2:0.02;d2:0.03;g12:0.95;] B[e12][f8:0.08;g10:0.01;g11:0.05;e12:0.56;f12:0.02;h12:0.25;] W[f11][f11:0.98;g11:0.01;] B[g10][f8:0.07;g10:0.54;h12:0.39;] W[e11][e11:1.00;] -1.000000

Closed loop

A closed loop shell which does

call mohex-self-play produce a set of games
refine neural net on those games by running ../simplefeature3/neuralnet/resnet.py
freeze trained neural net via ../simplefeature3/freeze_graph/freeze_graph_main.py
use nn_load to load new nn model
Go back to step 1

Since step 1) is rather time and computation consuming, e.g., 20-block version AlphaGo Zero generated over 4 million games. How fast is mohex-self-play on a single GPU GTX1080 computer? If 1000 simulations per move, a move consumes about 1s (would be slower if using expand_threshold 0). So if a game has 60 moves, then it takes 10^6 minutes for 1 million game 1 million minutes is 10**6/60.0 ~= 700 days

Closed loop scripts are at closedloop/

MCTS

In-tree formula

$$score(s,a) = (1-w)\Big(Q(s,a) + c\sqrt{\frac{\log{N(s)}}{N(s,a)}}\Big) + wR(s,a)+c_{pb}\frac{p(s,a)}{\sqrt{N(s,a)+1}}$$

If use_rave is false, then

$$score(s,a) = Q(s,a) + c_{pb}\frac{p(s,a)\sqrt{N(s)}}{N(s,a)+1}$$

playout: Use param_mohex use_playout_const to control the weight of random playout result backed up, default is 0. Setting it to 1 disables value net.

Authors

This project is developed by:

Chao Gao - main developer
Jakub Pawlewicz

License

This project is licensed under the GNU Lesser General Public License as published by the Free Software Foundation - version 3.

stranger117 / neurobenzene Goto Github PK

neurobenzene's Introduction

neurobenzene: An improved benzene project for playing and solving Hex with the help of deep neural networks

Prerequisites

How to build?

Running some tests

Train the neural net

Commands and Parameters

MoHex-self-play game format

Closed loop

MCTS

Further Reading

Authors

License

neurobenzene's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent