dylandjian / supergo Goto Github PK

View Code? Open in Web Editor NEW

277.0 12.0 62.0 116 KB

A student implementation of Alpha Go Zero

Python 100.00%

alphago-zero alphago reinforcement-learning pytorch mcts python3 machine-learning

supergo's Introduction

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

Do something about the process leaking
File of constants that match the paper constants ?
OGS / KGS API ?
Use logging instead of prints ?

CURRENTLY DOING

Optimizations
Clean code, create install script, write documentation
Trying to see if it learns something on my computer

DONE

Statistics (branch statistics)
Game that are longer than the threshold of moves are now used
MCTS
- Tree search
- Dirichlet noise to prior probabilities in the rootnode
- Adaptative temperature (either take max or proportionally)
- Sample random rotation or reflection in the dihedral group
- Multithreading of search
- Batch size evaluation to save computation
Dihedral group of board for more training samples
Learning without MCTS doesnt seem to work
Resume training
GTP on trained models (human.py, to plug with Sabaki)
Learning rate annealing (see this)
Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
Make the 3 components (self-play, training, evaluation) asynchronous
Multiprocessing of games for self-play and evaluation
Models and training without MCTS
Evaluation
Tromp Taylor scoring
Dataset ring buffer of self-play games
Loading saved models
Database for self-play games

LONG TERM PLAN ?

Compile my own version of Sabaki to watch games automatically while traning
Resignation ?
Training on a big computer / server once everything is ready ?

Resources

The article for this code
Official AlphaGo Zero paper
Custom environment implementation using pachi_py following the implementation that was originally made on OpenAI Gym
Using PyTorch for the neural networks
Using Sabaki for the GUI
General scheme, cool design
Monte Carlo tree search explaination
Nice tree search implementation

Statistics, check branch stats

For a 10 layers deep Resnet

9x9 board

soon

19x19 board

Differences with the official paper

No resignation
PyTorch instead of Tensorflow
Python instead of (probably) C++ / C

supergo's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 zhuzhenping jdc08161063 awesome-archive xuanhan863 sherlock42 guanlongtianzi xiaojie18 ntwuxc samueltt yucoian jacobjkwu jiangfan2 llv22 p1b234 robinsonche akailcy sebds easyfmxu yuanjungod yinyanghuafa lehaifeng solidji ahmadhajmosa gsj2019 douloswarn ddhyxm a515151 digits122 neosjt lionffen import-this-neteasemail 2794608905 jz52710 bitantiga creatorcen skyseezhang321 jin0302 violetcodet sanderland bobhu2010 thunderflash xrosliang user01 hadryan mscoyy hualangzeng ziqinc xianwei-zhou ycy0214179 gufenglees purpleyoung crashmoon chenshuai123 younghs-stu darmhy asclepiusinformatica medali-ai ccandle iq-scm kankan1322 mengjin001

supergo's Issues

What are the differences/similarities with minigo?

https://github.com/tensorflow/minigo is another open source implementation of AlphaGo Zero. Could you compare minigo and SuperGo? What are the strength and weaknesses of each?

What is the pytorch version that super go relies on?

What is the pytorch version that super go relies on? I had used pytorch 0.2,0.4 and it had many error.

What is the performance now?

I want to know what level can this AI program achieve after several days' training.
As I do not own a GPU, so it's hard for me to train it...

There might a bug in the implementation of PolicyNet

According to the AlphaGo Zero cheat sheet from this article

In the Policy Head, the input tensor will be convoluted with two filters to 2 channels (2x19x19), and then use FC decoder to output a 19x19 + 1 vector.

The code of PolicyNet will be

class PolicyNet(nn.Module):
    def __init__(self, inplanes, outplanes):
        super(PolicyNet, self).__init__()
        self.outplanes = outplanes
        # convoluted to 2 planes
        self.conv = nn.Conv2d(inplanes, 2, kernel_size=1)
        self.bn = nn.BatchNorm2d(1)
        self.logsoftmax = nn.LogSoftmax(dim=1)
        # NxN = 19x19 = outplanes -1
        # The FC will decode input from 2x19x19 to 19x19 + 1
        self.fc_input_size = 2*(outplanes-1)
        self.fc = nn.Linear(self.fc_input_size, outplanes)
        self.af1 = nn.ReLU()
        
    def forward(self, x):
        x = self.af1(self.bn(self.conv(x)))
        x = x.view(-1, self.fc_input_size)
        x = self.fc(x)
        probas = self.logsoftmax(x).exp()
        return probas

gobang version

Tensorflow + ubuntu
Please criticize and give advice

Can't self-play

Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle pachi_py.cypachi.PyPachiBoard objects

Environnement virtualenv, python3.6, ubuntu 18.04, pytorch with cuda 9.

Is there a step that I missed ?

What version of python do you have?

In version 3.8 , I got error.

Traceback (most recent call last):
  File "main.py", line 46, in <module>
    main()
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "main.py", line 27, in main
    pool = MyPool(2)
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/pool.py", line 319, in _repopulate_pool_static
    w = Process(ctx, target=worker,
  File "/home/user/anaconda3/envs/PyTorch/lib/python3.8/multiprocessing/process.py", line 82, in __init__
    assert group is None, 'group argument must be None

where was the problem

Hi,
I am wondering, did you figure out where was the problem with your program? Why was the agent playing so poorly?