Code Monkey home page Code Monkey logo

quoridor-ai's Introduction

Quoridor AI based on Monte Carlo Tree Search

Quoridor is an abstract strategy game played on a board of 81 (9x9) square spaces where the objective is to get your pawn to the opposite side of the board.

This AI agent playing Quoridor is based on Monte Carlo tree search (MCTS). Pure MCTS resulted in a poor performance. The performance was significantly improved after applying some heuristics. I added heuristics to the selection, expansion and simulation phase of the tree search (and also to post processes after search). You can see some of them on the "Some Heuristics Included" section below. If you want to see all the heuristics or their implementation details, refer to the comments in the source code. (Find the word "heuristic".)

You can play against this AI on the website (or web app) https://gorisanson.github.io/quoridor-ai/.

Play-Quoridor-against-AI-logo

The number of rollouts per move for each AI level on the website is following.

Level Rollouts per Move
Novice 2,500
Average 7,500
Good 20,000
Strong 60,000

Some Heuristics Included in the latest version (v0.3)

  • The branching factor of Quoridor is high due to a bunch of possible positions to place a wall. (There are 64 (8x8) possible positions if there is no wall on the board.) So I used a heuristic which only considers "probable" next walls to lower the branching factor. I used this heuristic on the selection/expansion phase and the rollout phase of MCTS. "Probable" next walls are the followings.

    1. Near the pawns (to disturb the opponent's pawn or support the player's pawn)
    2. Near the already placed walls
    3. Leftmost or rightmost horizontal walls
  • On the rollout phase of MCTS, move the pawn to one of the shortest paths with a certain probability A. And with the rest probability 1-A, place a ("probable" next) wall randomly if there are walls left, otherwise, move the pawn backwards to give some penalty to having no left walls. (I set A = 0.7)

  • On the selection/expansion phase of MCTS, if the opponent has no walls left, the player moves their pawn only to one of the shortest paths or places a wall only to interrupt the opponent's path (not to support their pawn). This heuristic lowers the branching factor on the end phase of a game.

  • Some common openings are included. This heuristic is only effective less than 6 plies.

Previous Works

There are some previous works done by others. (For available previous agents, I matched my agent against them to compare. You can see the results on the "Result" section below.)

  • Victor Massagué Respall, Joseph Alexander Brown and Hamma Aslam. Monte Carlo Tree Search for Quoridor. 2018. - This paper led me to apply MCTS to my agent. Although I couldn't find the source code of the AI agent on this paper, just knowing the viability in advance was very encouraging to me.

  • Daniel Borowski's Quoridor AI - It's hard for me to figure out his AI algorithm from his source code. But I could find his comment on reddit which says "I "sort of" implemented minimax with a depth of ~2 (hehe)."

  • Martijn van Steenbergen's SmartBrain 4 - This agent uses negamax of depth 4 with some heuristics (In his implementation, there are also SmartBrain 1, SmartBrain 2 and SmartBrain 3, each of which uses negamax of depth 1, 2 and 3 respectively. But, obviously, SmartBrain 4 is the strongest.)

Results

The following tables is a comparison of my 60k agent (Strong) to other agent types. The agents from "2.5k agent (Novice)" to "180k agent" are all my AI agents with difference between them being the number of rollouts. For the 60k agent, half of the games were played as the light-colored pawn and the other half as the dark-colored pawn (assuming the light-colored pawn moves first). But against Dainel's Quoridor AI, the games were played as the light-colored pawn only, since his AI only takes the dark-colored pawn. Against Daniel's Quoridor AI and Martijin's SmartBrain 4, the matches are done manually by referring to the move from my 60k agent and playing it against them, and vice versa.

uctConst is the exploration parameter used in UCT(Upper Confidence Bound 1 applied to trees).

Result of 60k v0.2 with uctConst = 0.5

Opponent Number of games played (as the light-colored / as the dark-colored for 60k) Wins as the light-colored for 60k Wins as the dark-colored for 60k Percentage of Wins for 60k Lower Confidence Bound (95%) Upper Confidence Bound (95%)
2.5k agent (Novice) v0.2 100 (50/50) 46 38 84% 75.3% 90.6%
7.5k agent (Average) v0.2 100 (50/50) 39 36 75% 65.3% 83.1%
20k agent (Good) v0.2 100 (50/50) 31 33 64% 53.8% 73.4%
120k agent v0.2 100 (50/50) 26 25 51% 40.8% 61.1%
180k agent v0.2 100 (50/50) 23 20 43% 33.1% 53.3%
Daniel's Quoridor AI 20 (20/0) 15 0 75% 50.9% 91.3%
Martijn's SmartBrain 4 10 (5/5) 4 4 80% 44.4% 97.5%

When I played against them by myself, I thought Martijn's SmartBrain 4 was stronger than Daniel's Quoridor AI. But, interestingly, the 60k agent seems to play better against Martijin's SmartBrain 4. In some matches, the play of Daniel's Quoridor AI somehow made the 60k agent exhaust walls too quickly and lose the game.

Martijn's implementation of Quoridor allows the diagonal jump even if there is not a wall or the board edge behind the pawn to be jumped. (The original Quoridor rule allows the diagonal jump "only if" there is a wall or the board edge behind the opponent's pawn to be jumped.) In a match against Martijn's SmartBrain 4, the 60k agent won the match although this out-of-the-rule diagonal jump was played twice by SmartBrain 4. This match is included on the statistics of the table above. And in another match, the out-of-the-rule diagonal jump was played by SmartBrain 4 when there were no left walls for both players and the win or lose was to be decided by whether the jump is accepted or not. So I nullified this match.

Result of 60k v0.3 with uctConst = 0.2

Opponent Number of games played (as the light-colored / as the dark-colored for 60k) Wins as the light-colored for 60k Wins as the dark-colored for 60k Percentage of Wins for 60k Lower Confidence Bound (95%) Upper Confidence Bound (95%)
2.5k agent (Novice) v0.3 100 (50/50) 48 50 98% 93.0% 99.8%
7.5k agent (Average) v0.3 100 (50/50) 43 39 82% 73.1% 89.0%
20k agent (Good) v0.3 100 (50/50) 32 26 58% 47.7% 67.8%
60k agent (Strong) v0.2 with uctConst = 0.5 100 (50/50) 37 30 67% 56.9% 76.1%
Daniel's Quoridor AI 20 (20/0) 18 0 90% 68.3% 98.8%
Martijn's SmartBrain 4 4 (2/2) 2 2 100% 39.8% 100%

The 60k v0.3 agent won 67 out of 100 games against the 60k v0.2 agent.

In a match against Martijn's SmartBrain 4, the 60k agent won the match although the out-of-the-rule diagonal jump was played once by SmartBrain 4. This match is included on the statistics of the table above.

References

  • Victor Massagué Respall, Joseph Alexander Brown and Hamma Aslam. Monte Carlo Tree Search for Quoridor. 2018.

  • Levente Kocsis and Csaba Szepesva ́ri. Bandit based Monte-Carlo Planning. 2006.

  • Peter Auer, Cesa-Bianchi and Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. 2002.

quoridor-ai's People

Contributors

gorisanson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

quoridor-ai's Issues

play from given board state

Hey, mate! Came across your work on a MCTS implementation for quoridor and thought it was very impressive. I'm trying to become a quoridor expert this year and I think an AI tool could be very helpful in aiding my studies. Is there any easy way to have the AI play from a given game state instead of the beginning of the game? Use case is to find the optimal way to play endgames (where the search space is a lot smaller). My idea was to feed a particular game state to the AI (probably not in browser) and let it compute from there, but I'm not sure how to achieve that. Any advice is appreciated.

switch the game to human vs human mode in local

Hello, I find your implementation very impressive. I want to ask if there is any way to switch to the 2-player version on the local computer (meaning the game is now human vs human, not human vs AI anymore). Thankss!!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.