Code Monkey home page Code Monkey logo

scalphagozero's Introduction

ScalphaGoZero Build Status

ScalphaGoZero is an independent implementation of DeepMind's AlphaGo Zero in Scala, using Deeplearning4J (DL4J) to run neural networks. You can either run experiments with models built in DL4J directly or import prebuilt Keras models.

ScalphaGoZero is mainly an engineering effort to demonstrate how complex and successful systems in machine learning are not bound to Python anymore. With access to powerful tools like ND4J for advanced maths, DL4J for neural networks, and the mature infrastructure of the JVM, languages like Scala can offer a viable alternative for data scientists.

This project is a Scala port of the AlphaGo Zero module found in Deep Learning and the Game of Go.

Getting started

Here's how to download the library, install it and run its main application:

git clone https://github.com/maxpumperla/ScalphaGoZero
cd ScalphaGoZero
sbt run

This application will set up two opponents, simulate 5 games between them using the AlphaGo Zero methodology and train one of the opponents with the experience data gained from the games. For more extensive experiments you should build out this demo.

To use Keras model import you need to generate the resources first:

cd src/test/python
pip install tensorflow keras
python generate_h5_resources.py

The generated, serialized Keras models are put into src/main/resources and are picked up by the KerasModel class, as demonstrated in our tests.

Core Concepts

Quite a few concepts are needed to build an AlphaGoZero system, ScalphaGoZero is intended as a software developer friendly approach with clean abstractions to help users get started. Many of the concepts used here can be reused for other games, only the basics are really designed for the game of Go.

  • Basics: To let a computer play a game you need to code the basics of the game, including what a Go board, a player, a move or the current game state is. Notably, the Zobrist hashing technique is implemented in the Go board class to speed up computation.
  • Encoders: Game states and moves need to be translated into something a neural network can use for training and predictions, namely tensors. We use ND4J for this encoding step. AlphaGo Zero needs a specific ZeroEncoder, but many other encoders are feasible and can be implemented by the user.
  • Agents: A Go-playing agent knows how to play a game, by selecting the next move, and handles game state information internally. For AlphaGo Zero you need a ZeroAgent, but other agents with simpler methodology can also lead to decent results.
  • Models To select a move, agents need machine learning models to predict the value of the current position (value function) and how well a next move would probably work (policy function). In AlphaGo Zero both of these components are integrated into one deep neural network, with a so called policy and value head. We implemented this model in DL4J here. To start with, you might want to work with simpler models. Each model that takes encoded states and outputs the right shape can be used within this framework.
  • Scoring To play actual games, agents need the ability to estimate scores at the end of a game to decide who won and reinforce the signals leading to victory (and weaken those leading to defeat). This includes territory estimation and reporting game results.
  • Experience: When opponents play many games against each other, they generate game play data, or experience, that can be used for training the agents. We use experience collectors to store this information.
  • Similation: The last piece needed to run your own AlphaGo Zero is to create a simulation between two ZeroAgent instances. The simulation stores experience data and lets your agents learn from it, so they become better players over time.

Contribute

ScalphaGoZero can be improved in many ways, here are a few examples:

  • Experience collectors build one large ND4J array, which won't work for large experiments. This should be refactored into an iterator that only provides you with the next batch needed for training.
  • Test coverage can be vastly improved. The basics are covered, but there are potentially many edge cases still missing.
  • Running a larger experiment and storing the weights somewhere freely accessible to users would be beneficial to get started and see reasonable results from the start.
  • Building a demo with a user interface would be nice. Agents could be wrapped in an HTTP server, for instance, and connect against a web interface so humans can play their bots.

scalphagozero's People

Contributors

maxpumperla avatar guizmaii avatar sderosiaux avatar barrybecker4 avatar

Watchers

James Cloos avatar 默书 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.