Code Monkey home page Code Monkey logo

blackbird's People

Contributors

kpwelsh avatar zackattack614 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

blackbird's Issues

Policy Head Softmax Applied Twice

The softmax function is applied twice in the network's policy head; it should only be applied once. Also note that the output size of the policy is hard-coded to 9, rather than a variable size representing the shape of the board.

Learning Rate Annealing

The current learning setup provides only a constant learning rate for the network's loss calculation. This rate should decrease over time, in some clever fashion.

MCTS shouldn't be backing up values from unexplored children

How the code is now:

  1. Expand until you find an un-expanded branch node.
  2. Run the NN to get the policy for that node.
  3. For each child, run the net and get the expected value of this children.
  4. Backup the values of each of those children.

The problem here is that we 1. chose a node. Then 2. updated the value of that node to be the average of the children values. This doesn't make sense, since an intelligent network would never have chosen all of those children.

Instead, we should only be backing up the value of the move we thought was realistic to make.

Push Games to SQLite3 Database

Games generated by local clients should be inserted into the SQLite3 database by default. A toggle in parameters.yaml should exist for this feature.

U values need to be updated on the fly

Consider the following scenario:

A has children b,c,d

We explore A. Expand the children and explore c.
A.N = 1
c.N = 1
b.N = 0
d.N = 0

The U values for b.N/d.N haven't been updated to account for the change in c.N's value.

Also, sum(child.N for child in parent.children) = parent.N.

History Panes

BoardState arrays should include historical game state data. This will affect the shape of the neural network input, and how data is serialized.

Network Codenames

After each training session, a network is created. This network should have a codename associated with it in the form <adjective>_<noun>, e.g. "pretty_paperclip."

Chess BoardState

A chess BoardState class needs to exist for BlackBird to learn the game.

The class should inherit from GameState, and override all functions.

Migrate to tf.data

The BlackBird.TrainingExample class should be removed, and replaced with a tf.data.Dataset. The Network.train() method should use a Dataset object.

Serialize Game States in Protocol Buffers

To ensure that game states are as compact as possible before transferring over the wire to a central repository, game states should be serialized in a ProtoBuf. Current state is JSON serialization, which is much less efficient.

MCTS.getBestMove doesn't re-sample the full branch

To sample the best branch for exploration + exploitation, the relative expected values of all of the nodes need to be compared after every update.

This code
selected_node = self.root
is only called once. Once it is set to the root, all of the subsequent playouts dive deeper into the same branch
while current_playouts < self.max_playouts: while any(selected_node.children): children_QU = [child.Q + child.U for child in selected_node.children] selected_node = selected_node.children[np.argmax(children_QU)]

It should instead start from the root again and recheck the values to make sure that it is exploring the optimal path, and not something it discovered to suck.

We stop updating MCTS values when find an end game

The MCTS algorithm doesn't back up the number of plays if we hit an end game state. This results in occasionally ~no exploration, since we can iterate to an game end that the AI thinks is good value (regardless of if it is), and then we will continue to go down that branch and quit.

The simulations should not stop just because we stumbled upon an end game.

Train from SQLite3 db

BlackBird's network should be able to train from game states stored in the blackbird.db file.

Reuse of variables in nested generators scares me

children_probs = [ (child.N ** (1/self.temperature)) / sum([child.N ** (1/self.temperature) for child in self.root.children]) for child in self.root.children]

its not clear in this expression which iterator child is from when evaluating child.N

Publish Training Games to Cloud

Training games that are generated on a client computer should be able to be published for a centralized server to train the next network on.

Repeated State Pane

A BoardState's Board member object should have a constant pane of how many times that position has been seen in the game's history.

This is helpful, for example, in informing BlackBird how close it is to a triple repetition in chess.

It is still randomly training rewards

We iterate over the entire state history to generate rewards, not just the states in that game.
That is done every game. It just iterates over the entire history and adds ~random rewards to the list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.