Code Monkey home page Code Monkey logo

agogo's Introduction

agogo

A reimplementation of AlphaGo in Go (specifically AlphaZero)

About

The algorithm is composed of:

  • a Monte-Carlo Tree Search (MCTS) implemented in the mcts package;
  • a Dual Neural Network (DNN) implemented in the dualnet package.

The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.

The contract specifies the description of a game state.

In this package, the contract is a Go interface declared in the game package: State.

Description of some concepts/ubiquitous language

  • In the agogo package, each player of the game is an Agent, and in a game, two Agents are playing in an Arena

  • The game package is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on a State of the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an object Player who is operating a PlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.

Training process

Applying the Algo on a game

This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package. Then, the model can be saved and used as a player.

The steps to train the algorithm are:

  • Creating a structure that is fulfilling the State interface (aka a game).
  • Creating a configuration for your AZ internal MCTS and NN.
  • Creating an AZ structure based on the game and the configuration
  • Executing the learning process (by calling the Learn method)
  • Saving the trained model (by calling the Save method)

The steps to play against the algorithm are:

  • Creating an AZ object
  • Loading the trained model (by calling the Read method)
  • Switching the agent to inference mode via the SwitchToInference method
  • Get the AI move by calling the Search method and applying the move to the game manually

Examples

Four board games are implemented so far. Each of them is defined as a subpackage of game:

tic-tac-toe

Tic-tac-toe is a m,n,k game where m=n=k=3.

Training

Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model

// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
     board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
     for i := range board {
     if board[i] == 0 {
          board[i] = 0.001
     }
     }
     playerLayer := make([]float32, len(a.Board()))
     next := a.ToMove()
     if next == game.Player(game.Black) {
     for i := range playerLayer {
          playerLayer[i] = 1
     }
     } else if next == game.Player(game.White) {
     // vecf32.Scale(board, -1)
     for i := range playerLayer {
          playerLayer[i] = -1
     }
     }
     retVal := append(board, playerLayer...)
     return retVal
}

func main() {
    // Create the configuration of the neural network
     conf := agogo.Config{
         Name:            "Tic Tac Toe",
         NNConf:          dual.DefaultConf(3, 3, 10),
         MCTSConf:        mcts.DefaultConfig(3),
         UpdateThreshold: 0.52,
     }
     conf.NNConf.BatchSize = 100
     conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
     conf.NNConf.K = 3
     conf.NNConf.SharedLayers = 3
     conf.MCTSConf = mcts.Config{
         PUCT:           1.0,
         M:              3,
         N:              3,
         Timeout:        100 * time.Millisecond,
         PassPreference: mcts.DontPreferPass,
         Budget:         1000,
         DumbPass:       true,
         RandomCount:    0,
     }

     conf.Encoder = encodeBoard

    // Create a new game
    g := mnk.TicTacToe()
    // Create the AlphaZero structure 
    a := agogo.New(g, conf)
    // Launch the learning process
    err := a.Learn(5, 50, 100, 100) // 5 epochs, 50 episode, 100 NN iters, 100 games.
    if err != nil {
        log.Println(err)
    }
    // Save the model
     a.Save("example.model")
}

Inference

func encodeBoard(a game.State) []float32 {
    board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
    for i := range board {
        if board[i] == 0 {
            board[i] = 0.001
        }
    }
    playerLayer := make([]float32, len(a.Board()))
    next := a.ToMove()
    if next == game.Player(game.Black) {
        for i := range playerLayer {
            playerLayer[i] = 1
        }
    } else if next == game.Player(game.White) {
        // vecf32.Scale(board, -1)
        for i := range playerLayer {
            playerLayer[i] = -1
        }
    }
    retVal := append(board, playerLayer...)
    return retVal
}

func main() {
    conf := agogo.Config{
        Name:     "Tic Tac Toe",
        NNConf:   dual.DefaultConf(3, 3, 10),
        MCTSConf: mcts.DefaultConfig(3),
    }
    conf.Encoder = encodeBoard

    g := mnk.TicTacToe()
    a := agogo.New(g, conf)
    a.Load("example.model")
    a.A.Player = mnk.Cross
    a.B.Player = mnk.Nought
    a.B.SwitchToInference(g)
    a.A.SwitchToInference(g)
    // Put x int the center
    stateAfterFirstPlay := g.Apply(game.PlayerMove{
        Player: mnk.Cross,
        Single: 4,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · · · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥

    // What to do next
    move := a.B.Search(stateAfterFirstPlay)
    fmt.Println(move)
    // 1
    g.Apply(game.PlayerMove{
        Player: mnk.Nought,
        Single: move,
    })
    fmt.Println(stateAfterFirstPlay)
    // ⎢ · O · ⎥
    // ⎢ · X · ⎥
    // ⎢ · · · ⎥
}

Misc

A Funny Thing Happened On The Way To Reimplementing AlphaGo - A talk by @chewxy (one of the authors) about this specific implementation

Credits

Original implementation credits to

agogo's People

Contributors

carleeto avatar chewxy avatar owulveryck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agogo's Issues

when it's ready

Watched the preso, would love to see the code. Opening this issue for the code. :-)

Mancala / kalah

"Four board games are implemented so far."

Could you please add Mancala / kalah game as well?

Can't run tic-tac-toc

When I try to run cmd/tictactoe/main.go, I get a panic:

go: downloading github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0
go: downloading golang.org/x/image v0.0.0-20201208152932-35266b937fa6
go: downloading gorgonia.org/gorgonia v0.9.17-0.20210124090702-531c6df2c434
go: downloading gorgonia.org/tensor v0.9.18
go: downloading github.com/chewxy/math32 v1.0.6
go: downloading gorgonia.org/vecf32 v0.9.0
go: downloading github.com/awalterschulze/gographviz v2.0.3+incompatible
go: downloading github.com/apache/arrow/go/arrow v0.0.0-20210105145422-88aaea5262db
go: downloading github.com/chewxy/hm v1.0.0
go: downloading go4.org/unsafe/assume-no-moving-gc v0.0.0-20201222180813-1025295fd063
go: downloading github.com/google/flatbuffers v1.12.0
go: downloading gonum.org/v1/gonum v0.8.2
go: downloading gorgonia.org/vecf64 v0.9.0
go: downloading github.com/leesper/go_rng v0.0.0-20190531154944-a612b043e353
go: downloading github.com/xtgo/set v1.0.0
go: downloading gorgonia.org/dawson v1.2.0
go: downloading github.com/gogo/protobuf v1.3.1
go: downloading github.com/golang/protobuf v1.4.3
go: downloading golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
go: downloading google.golang.org/protobuf v1.25.0
panic: Something in this program imports go4.org/unsafe/assume-no-moving-gc to declare that it assumes a non-moving garbage collector, but your version of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that it's safe against the go1.18 runtime. If you want to risk it, run with environment variable ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.18 set. Notably, if go1.18 adds a moving garbage collector, this program is unsafe to use.

goroutine 1 [running]:
go4.org/unsafe/assume-no-moving-gc.init.0()
/home/haze/go/pkg/mod/go4.org/unsafe/[email protected]/untested.go:24 +0x1f4
exit status 2

Cannot train tic-tac-toe with more than 14 episodes

This is a strange bug. I am using this code:

func encodeBoard(a game.State) []float32 {
	board := EncodeTwoPlayerBoard(a.Board(), nil)
	for i := range board {
		if board[i] == 0 {
			board[i] = 0.001
		}
	}
	playerLayer := make([]float32, len(a.Board()))
	next := a.ToMove()
	if next == game.Player(game.Black) {
		for i := range playerLayer {
			playerLayer[i] = 1
		}
	} else if next == game.Player(game.White) {
		// vecf32.Scale(board, -1)
		for i := range playerLayer {
			playerLayer[i] = -1
		}
	}
	retVal := append(board, playerLayer...)
	return retVal
}

func TestAZ(t *testing.T) {
	conf := Config{
		Name:            "Tic Tac Toe",
		NNConf:          dual.DefaultConf(3, 3, 10),
		MCTSConf:        mcts.DefaultConfig(3),
		UpdateThreshold: 0.52,
	}
	conf.NNConf.BatchSize = 100
	conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
	conf.NNConf.K = 3
	conf.NNConf.SharedLayers = 3
	conf.MCTSConf = mcts.Config{
		PUCT:           1.0,
		M:              3,
		N:              3,
		Timeout:        100 * time.Millisecond,
		PassPreference: mcts.DontPreferPass,
		Budget:         1000,
		DumbPass:       true,
		RandomCount:    0,
	}

	conf.Encoder = encodeBoard

	g := mnk.TicTacToe()
	a := New(g, conf)

	//err := a.Learn(1, 20, 100, 100)
	err := a.Learn(1, 14, 100, 100)
	if err != nil {
		t.Fatal(err)
	}
}

with err := a.Learn(1, 14, 100, 100), the test pass, but with err := a.Learn(1, 15, 100, 100), the test fails with this error:

2021/01/18 09:24:40 Self Play for epoch 0. Player A 0xc000368850, Player B 0xc0003688c0
2021/01/18 09:24:40 Using Dummy
2021/01/18 09:24:40 Set up selfplay: Switch To inference for A. A.NN 0xc0003409c0 (*dual.Dual)
2021/01/18 09:24:40 Set up selfplay: Switch To inference for B. B.NN 0xc000340a90 (*dual.Dual)
2021/01/18 09:24:40     Episode 0
2021/01/18 09:24:40     Episode 1
2021/01/18 09:24:41     Episode 2
2021/01/18 09:24:41     Episode 3
2021/01/18 09:24:42     Episode 4
2021/01/18 09:24:43     Episode 5
2021/01/18 09:24:44     Episode 6
2021/01/18 09:24:45     Episode 7
2021/01/18 09:24:45     Episode 8
2021/01/18 09:24:46     Episode 9
2021/01/18 09:24:47     Episode 10
2021/01/18 09:24:48     Episode 11
2021/01/18 09:24:48     Episode 12
2021/01/18 09:24:49     Episode 13
2021/01/18 09:24:50     Episode 14
    agogo_test.go:69: Train fail: PC: 246: PC 246. Failed to execute instruction Aᵀ{0, 2, 3, 1} [CPU144]        CPU144  false   true    false: Failed to carry op.Do(): Dimension mismatch. Expected 2, got 4

Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices

I am trying to run the simple example of tic-tac-toe as is:

package agogo

import (
	"log"
	"time"

	dual "github.com/gorgonia/agogo/dualnet"
	"github.com/gorgonia/agogo/encoding/mjpeg"
	"github.com/gorgonia/agogo/game"
	"github.com/gorgonia/agogo/game/mnk"
	"github.com/gorgonia/agogo/mcts"

	_ "net/http/pprof"
)

func encodeBoard(a game.State) []float32 {
	board := EncodeTwoPlayerBoard(a.Board(), nil)
	for i := range board {
		if board[i] == 0 {
			board[i] = 0.001
		}
	}
	playerLayer := make([]float32, len(a.Board()))
	next := a.ToMove()
	if next == game.Player(game.Black) {
		for i := range playerLayer {
			playerLayer[i] = 1
		}
	} else if next == game.Player(game.White) {
		// vecf32.Scale(board, -1)
		for i := range playerLayer {
			playerLayer[i] = -1
		}
	}
	retVal := append(board, playerLayer...)
	return retVal
}

func ExampleAZ() {
	conf := Config{
		Name:            "Tic Tac Toe",
		NNConf:          dual.DefaultConf(3, 3, 10),
		MCTSConf:        mcts.DefaultConfig(3),
		UpdateThreshold: 0.52,
	}
	conf.NNConf.BatchSize = 100
	conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
	conf.NNConf.K = 3
	conf.NNConf.SharedLayers = 3
	conf.MCTSConf = mcts.Config{
		PUCT:           1.0,
		M:              3,
		N:              3,
		Timeout:        100 * time.Millisecond,
		PassPreference: mcts.DontPreferPass,
		Budget:         1000,
		DumbPass:       true,
		RandomCount:    0,
	}

	conf.Encoder = encodeBoard
	outEnc := mjpeg.NewEncoder(300, 300)
	conf.OutputEncoder = outEnc

	g := mnk.TicTacToe()
	a := New(g, conf)

	err := a.Learn(1, 1, 10, 1) // 5 epochs, 50 episode, 100 NN iters, 100 games.
	if err != nil {
		log.Fatal(err)
	}
	// output:
}

Running the test fails with this error.

❯ go test -run=^Example
2021/01/16 17:22:18 Self Play for epoch 0. Player A 0xc00043e070, Player B 0xc00043e2a0
2021/01/16 17:22:18 Using Dummy
2021/01/16 17:22:18 Set up selfplay: Switch To inference for A. A.NN 0xc0000c9380 (*dual.Dual)
2021/01/16 17:22:18 Set up selfplay: Switch To inference for B. B.NN 0xc0000c9450 (*dual.Dual)
2021/01/16 17:22:18     Episode 0
2021/01/16 17:22:19 Train fail: shuffle batch failed - matX: Not yet implemented: native matrix for colmajor or unpacked matrices
exit status 1
FAIL    github.com/gorgonia/agogo       1.229s

This error is triggered from:

agogo/dualnet/meta.go

Lines 71 to 73 in 05cf5f1

if matXs, err = native.MatrixF32(Xs); err != nil {
return errors.Wrapf(err, "shuffle batch failed - matX")
}

It looks like the tensor library is faulty here.

I will investigate. Meanwhile, any hint welcome.

Meanwhile, disabling the shuffleBatch method in the dualnet works.

9x9 board

Go can be played on a 9x9 board. Would a board this size still cost $70,000 to train?

Wrong model architecture in residual network?

I wonder is there any misconfiguration in model architecture. Specifically this function: https://github.com/gorgonia/agogo/blob/master/dualnet/ermahagerdmonards.go#L67
Because based from my understanding, from the paper (link) page 8/18 it said:

Each residual block applies the following modules sequentially to its input:
(1) A convolution of 256 filters of kernel size 3 × 3 with stride 1
(2) Batch normalization
(3) A rectifier nonlinearity
(4) A convolution of 256 filters of kernel size 3 × 3 with stride 1
(5) Batch normalization
(6) A skip connection that adds the input to the block
(7) A rectifier nonlinearity

Point 6 means that the add operation should be from input to the block and each module should be in sequence. I wonder is this a correct implementation:

func (m *maebe) share(input *G.Node, filterCount, layer int) (*G.Node, batchNormOp, batchNormOp) {
	layer1, l1Op := m.res(input, filterCount, fmt.Sprintf("Layer1 of Shared Layer %d", layer))
	layer2, l2Op := m.res(layer1, filterCount, fmt.Sprintf("Layer2 of Shared Layer %d", layer))
	added := m.do(func() (*G.Node, error) { return G.Add(input, layer2) })
	retVal := m.rectify(added)
	return retVal, l1Op, l2Op
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.