Code Monkey home page Code Monkey logo

evolutionsoftswiss / alpha-zero-learning Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 3.0 55.59 MB

Java based alpha zero reinforcement learning. The generic base module allows implementation of any adversary board game. Example implementation for Tic Tac Toe.

License: Apache License 2.0

Java 100.00%
alpha-zero alphazero alphago-zero alphagozero reinforcement-learning deeplearning4j tic-tac-toe connect-four monte-carlo-tree-search mcts self-learning alphago

alpha-zero-learning's Introduction

Java Alpha Zero Reinforcement Learning with deeplearning4j

The provided AlphaZero learning here is a Java implementation of the alpha zero algorithm using deeplearning4j library.

Introduction

There are already several Alpha[Go] Zero-related projects on github in python and also in C++. The Java implementation here could help to reuse your existing Java game logics with alpha zero reinforcement learning. This approach might also be simpler for individuals more familiar with Java than python.

Alpha[Go] Zero Algorithm

For nearly 20 years after IBM DeepBlue defeated Kasparov 1997 in chess, 19x19 Go remained a domain where computers were far from achieving human-level play. This changed definitely with AlphaGo's victory over Lee Sedol in the Google Deepmind Challenge Match. Alpha Go has beaten the worlds best Go Player in 5 Games 4:1. There is a very interesting Movie available on Youtube around the event: AlphaGo - The Movie. Aside the DeepMind efforts it also gives insights into the ideology and philosophy of the Go Game in asian countries.

The AlphaGo algorithm was further adapted and improved with AlphaGo Zero Mastering the game of Go without human knowledge and Alpha Zero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

Why There Aren't Thousands of Strong Artificial Go intelligences Now ?

It is not enough to know the algorithm and having an implementation ready. A rough calculation of Computer-Go zero performance estimates the training time to generate as much positions as AlphaGo zero could take 1700 years on a single machine with standard hardware.

Efforts like leela-zero try to replicate the training process in a public and distributed manner. Similarly minigo attempted to reproduce the learning progress.

It's easier to adapt the algorithm to less complex board games like connect four, Gomoku, Othello and others. With such games it's more realistic to perform the necessary training to obtain a strong artificial intelligence.

Using Java Alpha Zero

The goal of Java alpha-zero-learning is to enable alpha zero for less complex games. The implementation does not support distributed learning. It is designed to run on a single machine, optionally using graphic cards for net updates.

Generic release build

You can use the existing Java alpha-zero-learning with the generic published release builds. With ch.evolutionsoft.rl.alphazero.tictactoe-1.1.1-jar-with-dependencies you can directly repeat the training for the Tic Tac Toe prototype. See also the submodule tic-tac-toe/README.md for a few more information.

ch.evolutionsoft.rl.alphazero.adversary-learning-1.1.1-jar-with-dependencies would let you reuse the general part of the implementation for other board games. The submodule alpha-zero-adversary-learning/README.md contains hints about a new board game implementation.

Rebuild for your hardware

With deeplearning4j you can use CUDA to perform the neural net model computations on a GPU. You would configure it by replacing the following two dependencies with CUDA dependencies matching your available version.

<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-native-platform</artifactId>
</dependency>
<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-native</artifactId>
</dependency>

Also without GPU there is the AVX/AVX2 performance improvement on newer CPU's. Use the operating system dependencies for avx to enable it. The logs will show a warning when you're running the generic build on a avx/avx2 supported CPU:

<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-native</artifactId>
	<version>1.0.0-beta7</version>
	<classifier>windows-x86_64-avx2</classifier>
</dependency>

Implement new games

Refer to the submodule alpha-zero-adversary-learning/README.md to see what's necessary for a new game implementation.

Mvn build and packaging

The Maven (mvn) builds for each submodule take several minutes and a lot of different system architecture dependencies are packaged into the jar files with dependencies. This is deeplearning4j related and leading to larger distribution packages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.