Code Monkey home page Code Monkey logo

supergo's Introduction

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

  • Do something about the process leaking
  • File of constants that match the paper constants ?
  • OGS / KGS API ?
  • Use logging instead of prints ?

CURRENTLY DOING

  • Optimizations
  • Clean code, create install script, write documentation
  • Trying to see if it learns something on my computer

DONE

  • Statistics (branch statistics)
  • Game that are longer than the threshold of moves are now used
  • MCTS
    • Tree search
    • Dirichlet noise to prior probabilities in the rootnode
    • Adaptative temperature (either take max or proportionally)
    • Sample random rotation or reflection in the dihedral group
    • Multithreading of search
    • Batch size evaluation to save computation
  • Dihedral group of board for more training samples
  • Learning without MCTS doesnt seem to work
  • Resume training
  • GTP on trained models (human.py, to plug with Sabaki)
  • Learning rate annealing (see this)
  • Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
  • Make the 3 components (self-play, training, evaluation) asynchronous
  • Multiprocessing of games for self-play and evaluation
  • Models and training without MCTS
  • Evaluation
  • Tromp Taylor scoring
  • Dataset ring buffer of self-play games
  • Loading saved models
  • Database for self-play games

LONG TERM PLAN ?

  • Compile my own version of Sabaki to watch games automatically while traning
  • Resignation ?
  • Training on a big computer / server once everything is ready ?

Resources

Statistics, check branch stats

For a 10 layers deep Resnet

9x9 board

soon

19x19 board

Differences with the official paper

  • No resignation
  • PyTorch instead of Tensorflow
  • Python instead of (probably) C++ / C

supergo's People

Contributors

dylandjian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

supergo's Issues

where was the problem

Hi,
I am wondering, did you figure out where was the problem with your program? Why was the agent playing so poorly?

What is the performance now?

I want to know what level can this AI program achieve after several days' training.
As I do not own a GPU, so it's hard for me to train it...

Can't self-play

Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle pachi_py.cypachi.PyPachiBoard objects

Environnement virtualenv, python3.6, ubuntu 18.04, pytorch with cuda 9.

Is there a step that I missed ?

There might a bug in the implementation of PolicyNet

According to the AlphaGo Zero cheat sheet from this article

image

In the Policy Head, the input tensor will be convoluted with two filters to 2 channels (2x19x19), and then use FC decoder to output a 19x19 + 1 vector.

The code of PolicyNet will be

class PolicyNet(nn.Module):
    def __init__(self, inplanes, outplanes):
        super(PolicyNet, self).__init__()
        self.outplanes = outplanes
        # convoluted to 2 planes
        self.conv = nn.Conv2d(inplanes, 2, kernel_size=1)
        self.bn = nn.BatchNorm2d(1)
        self.logsoftmax = nn.LogSoftmax(dim=1)
        # NxN = 19x19 = outplanes -1
        # The FC will decode input from 2x19x19 to 19x19 + 1
        self.fc_input_size = 2*(outplanes-1)
        self.fc = nn.Linear(self.fc_input_size, outplanes)
        self.af1 = nn.ReLU()
        
    def forward(self, x):
        x = self.af1(self.bn(self.conv(x)))
        x = x.view(-1, self.fc_input_size)
        x = self.fc(x)
        probas = self.logsoftmax(x).exp()
        return probas

What version of python do you have?

What version of python do you have?

In version 3.8 , I got error.

Traceback (most recent call last):
  File "main.py", line 46, in <module>
    main()
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "main.py", line 27, in main
    pool = MyPool(2)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/pool.py", line 319, in _repopulate_pool_static
    w = Process(ctx, target=worker,
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/process.py", line 82, in __init__
    assert group is None, 'group argument must be None 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.