The blackbird from zackattack614

TeacherPolicy not used

https://github.com/jordan-singer/BlackBird/blob/master/src/Network.py#L219

The teacherPolicy variable is not used anywhere.

Policy Head Softmax Applied Twice

The softmax function is applied twice in the network's policy head; it should only be applied once. Also note that the output size of the policy is hard-coded to 9, rather than a variable size representing the shape of the board.

Create python API for exposing supported python operations.

Learning Rate Annealing

The current learning setup provides only a constant learning rate for the network's loss calculation. This rate should decrease over time, in some clever fashion.

Self-Play ELO Rating System

BlackBird needs a rating system so that performance across training sessions can be measured.

MCTS shouldn't be backing up values from unexplored children

How the code is now:

Expand until you find an un-expanded branch node.
Run the NN to get the policy for that node.
For each child, run the net and get the expected value of this children.
Backup the values of each of those children.

The problem here is that we 1. chose a node. Then 2. updated the value of that node to be the average of the children values. This doesn't make sense, since an intelligent network would never have chosen all of those children.

Instead, we should only be backing up the value of the move we thought was realistic to make.

Push Games to SQLite3 Database

Games generated by local clients should be inserted into the SQLite3 database by default. A toggle in parameters.yaml should exist for this feature.

Compile graph from JS generated JSON through python API.

Expose python APIs for stateless training commands.

Network Architecture GUI

An end user should be able to modify the architecture of their neural network via a GUI.

U values need to be updated on the fly

Consider the following scenario:

A has children b,c,d

We explore A. Expand the children and explore c.
A.N = 1
c.N = 1
b.N = 0
d.N = 0

The U values for b.N/d.N haven't been updated to account for the change in c.N's value.

Also, sum(child.N for child in parent.children) = parent.N.

Card

History Panes

BoardState arrays should include historical game state data. This will affect the shape of the neural network input, and how data is serialized.

The gamestate classes shouldn't know about policies

#64

Serialization of policy and evaluation shouldn't be handled by the gamestate class.

Network Codenames

After each training session, a network is created. This network should have a codename associated with it in the form <adjective>_<noun>, e.g. "pretty_paperclip."

Chess BoardState

A chess BoardState class needs to exist for BlackBird to learn the game.

The class should inherit from GameState, and override all functions.

Migrate to tf.data

The BlackBird.TrainingExample class should be removed, and replaced with a tf.data.Dataset. The Network.train() method should use a Dataset object.

Serialize Game States in Protocol Buffers

To ensure that game states are as compact as possible before transferring over the wire to a central repository, game states should be serialized in a ProtoBuf. Current state is JSON serialization, which is much less efficient.

Deserialize Game States from Protocol Buffers

Given a game state written in a protobuf, the corresponding BoardState object should be able to deserialize and return a full game state to train on.

Typo?

Should this read example.State.Player?

https://github.com/jordan-singer/BlackBird/blob/c7b6d3558b4af5b042a3fe3f781158b53b88d921/src/blackbird.py#L48

Create Electron app with graph generator

MCTS.getBestMove doesn't re-sample the full branch

To sample the best branch for exploration + exploitation, the relative expected values of all of the nodes need to be compared after every update.

This code
selected_node = self.root
is only called once. Once it is set to the root, all of the subsequent playouts dive deeper into the same branch
while current_playouts < self.max_playouts: while any(selected_node.children): children_QU = [child.Q + child.U for child in selected_node.children] selected_node = selected_node.children[np.argmax(children_QU)]

It should instead start from the root again and recheck the values to make sure that it is exploring the optimal path, and not something it discovered to suck.

Load Training Statistics to SQLite3 DB

The win/loss/draw counts vs random, old, and standard MCTS should be logged in the TrainingStatisticsFact table.

Dirichlet Noise Applied During Evaluation

Dirichlet noise should only be applied in self-play in order to aid in exploration in training, not during network evaluation or official play.

Loss Not Appropriately Defined

Loss, as defined here, is just the first element of a column vector. It should use reduce_sum over the vector, not just return one element of that vector.

https://github.com/jordan-singer/BlackBird/blob/ec37781c312623d3863a8b6adbc8841280c1e5df/src/network.py#L85

We stop updating MCTS values when find an end game

The MCTS algorithm doesn't back up the number of plays if we hit an end game state. This results in occasionally ~no exploration, since we can iterate to an game end that the AI thinks is good value (regardless of if it is), and then we will continue to go down that branch and quit.

The simulations should not stop just because we stumbled upon an end game.

zackattack614 / blackbird Goto Github PK

blackbird's People

Contributors

Stargazers

Watchers

Forkers

blackbird's Issues

Recommend Projects

Recommend Topics

Recommend Org