Code Monkey home page Code Monkey logo

checkers-reinforcement-learning's Introduction

Synopsis

This code defines a checkers board, and the ability to officiate a game of checkers between two AI which are playing using the rules I played with as a child, which are the same as traditional American Checkers (English Droughts), except that a win can only be achieved by capturing all of the opponents pieces. Two AI are defined to compete in checkers, one based in reinforcement learning and one in alpha-beta pruning.

Alpha-Beta Pruning AI

This AI uses a traditional alpha-beta pruning algorithm (with specified depth), where the value of a potential future board configuration is infinity for a winning game, -infinity for a losing game, and given by the following equation for any other board configuration:

equation

In the above equation, U and K represent the number of uncrowned pieces and number of kings a player has, and their subscripts s and o indicate if the player is itself or it's opponent respectively. The delta refers to the change in the value following it between the current board configuration and the potential board configuration the AI is considering.

Reinforcement Learning AI

The reinforcement learning AI is based in the ideas of Q-learning. Since creating and training a Q-table for every possible board configuration takes immense computational power, a set of characteristics defining a set of states was created such that each board configuration exists in one and only one of these states.

State Characteristics

The characteristics of the states are as follows:

  1. Number of own uncrowned pieces
  2. Number of opponent uncrowned pieces
  3. Number of own kings
  4. Number of opponent kings
  5. Number of own pieces (crowned and uncrowned) on left and right edges of board
  6. Integer value of own vertical center of mass
  7. Integer value of opponent vertical center of mass

For characteristics 6 and 7, the center of mass is calculated with an uncrowned piece having the same mass as a king.

Dynamically Discovering Transitions

Since knowing the state of a board is not enough information to figure out it's possible transitions to other states, initializing a traditional Q-Table is not possible, and if it were, would require a very large amount of memory. With this in mind I instead dynamically create/discover a set of possible transitions between states during training, which I store as keys in a dictionary.

Updating Transition Values

There are two formulas used to update the value of a transition between two states. In these formulas T(state1, state2) is the learned scalar value for transitioning from state1 to state2, R is the scalar rewards function between two states (same as described for the alpha-beta pruning AI), alpha is the learning rate, and lambda is the discount rate.

Directly preceding the AI's turn, if it has already moved at least once this game and the game board is not terminal, then the following value update is used:

equation

If the game board is terminal, then the AI will be notified and will apply the following value update:

equation

Picking Moves

When it is the reinforcement learning AI's turn, it looks at a list of possible moves it could make, and if their respective state transitions are not yet known, it adds them to the dictionary of transitions. Then it will chose to do one of two things, based on a specified probability. The first is that it could chose a move at random, this is used during training so that transitions other then the current highest valued transition will still be trained. If the learning AI does not move randomly, it will chose the move who's state transition has the highest value associated with it.

checkers-reinforcement-learning's People

Contributors

dexw25 avatar samragusa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

checkers-reinforcement-learning's Issues

Initial Transitions for Q-Learning Agent is empty

When I run the AI.py, initial transition data is (None, None) which is causing a crush

`---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in ()
1 from Board import *
----> 2 from AI import *

/home/semih/ozu/rl/homework1/Checkers-Reinforcement-Learning/AI.py in ()
554 validation_info = []
555 for j in range(NUM_TRAINING_ROUNDS):
--> 556 training_info.extend(play_n_games(PLAYER1, PLAYER2, NUM_GAMES_TO_TRAIN, TRAINING_MOVE_LIMIT))
557 PLAYER1.print_transition_information(PLAYER1.get_transitions_information())
558 PLAYER1.set_random_move_probability(0)

/home/semih/ozu/rl/homework1/Checkers-Reinforcement-Learning/AI.py in play_n_games(player1, player2, num_games, move_limit)
457 outcome_counter[j][5] = piece_counter[3]
458
--> 459 player1.game_completed()
460 player2.game_completed()
461 #game_board.print_board()

/home/semih/ozu/rl/homework1/Checkers-Reinforcement-Learning/AI.py in game_completed(self)
169 transition = (self.pre_last_move_state ,self.post_last_move_state)
170
--> 171 self.transitions[transition] = self.transitions[transition] + self.learning_rate * reward_function(transition[0],cur_state)
172
173 self.pre_last_move_state = None

KeyError: (None, None).`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.