kaufmannlukas / ds-ultimate-tic-tac-toe Goto Github PK
View Code? Open in Web Editor NEWXOXO² - Use Reinforcement Learning to train agent to play U_T-T-T.
XOXO² - Use Reinforcement Learning to train agent to play U_T-T-T.
Check list of files that are updated/not updated yet:
Create an overview of what we have done and why and in which order.
Example PPO:
Example MCTS:
Example model selection:- We researched the following models and algorithms: ...
Game._Player()
@property
=> built-in decorater for a function => "decorates the function"; i.e. current_player
function can be treated as a variable, not as function;|
= or
finished_games
=> second part of the code of the line is for creating a two dimensional array out of the one dimensional won/not won arrayStatus quo:
We run and let MCTS "learn", but for the next run of MCTS we start again from Zero.
Idea:
Use the previous iterations and memorise (until a certain layer of moves) the previous playouts and values and nodes.
Approach:
Web Interface?
Outsource it to UX/UI team?
REPOS on how to create an UTTT interface (with React):
Questions:
How could we approach the INTERFACE?
Did she work with PPO?
XOXO²
Create/Edit README file in Github - explain project / whole process ( = guideline for presentation)
Work on documentation => see documentation task
Run 2 MCTS agents with different value system:
Create new function to find out the max/deepest depth of a memory - similar to count_nodes and count_leaves
deepest path in mcts.py
Create table with all playouts - use EDA notebook
Research info in our Model sheet:
https://docs.google.com/spreadsheets/d/1d6w3fd5od51H21_R5_ACFNvpBmRK5gTZIEclcFiervY/edit#gid=1847487093
For more advanced AI algorithms, it's often better to keep the game environment and the AI agent separate. This separation allows for more flexibility and easier integration of different agents.
Here's a recommended structure for the implementation, including:
Game Environment (UTTT):
Create a UTTT class that encapsulates the game rules, state representation, and methods for interacting with the game. This class should include functions to:
class UTTT:
def __init__(self):
# Initialize game state, rules, and attributes
...
def get_current_state(self):
# Return the current game state
...
def get_valid_actions(self):
# Return a list of valid actions in the current state
...
def perform_action(self, action):
# Update the game state based on the selected action
...
def is_game_over(self):
# Check if the game is over and determine the outcome
...
Agent:
Create an agent class that can interact with the game environment to make decisions. You can implement different agents, including the random baseline agent, MCTS-based agent, or more advanced AI models.
class RandomAgent:
def __init__(self, num_actions):
# Initialize the agent with relevant parameters
...
def select_action(self, state):
# Implement action selection logic (e.g., random choice)
...
Policy:
If you plan to implement more advanced algorithms, such as MCTS or deep reinforcement learning, you might use a policy class to represent the agent's strategy.
class Policy:
def __init__(self, model):
# Initialize the policy with a model (e.g., neural network)
...
def select_action(self, state):
# Implement action selection logic based on the policy model
...
Interactions and Game Loop:
Finally, in your main script or game loop, you can set up the game environment, instantiate the agent, and manage interactions. The game loop should involve the following steps:
# Initialize the game environment
game = UTTT()
# Initialize the agent
agent = RandomAgent(game.get_num_actions()) # Or use a more advanced agent if desired
while not game.is_game_over():
# Get the current state from the game
state = game.get_current_state()
# The agent selects an action based on the state
action = agent.select_action(state)
# Update the game state based on the action
game.perform_action(action)
# Determine the outcome and handle it accordingly
outcome = game.get_game_outcome()
This happens after we defined the base structure for the agent.
NN SETUP
Input representation:
How to represent the game state as input to the neural network?
=> use CNN to capture spacial relationships within the board
Network architecture
Which architecture to choose for the NN?
i.e. deep neural networks, such as CNN
=> experiment with different network architectures and layer sizes to find the best
=> our network should also be able to process the game state and produce Q-values for different values
Output layer
=> the output layer of the chosen NN should have as many nodes as there are possible actions in the game
=> each node in the output layer corresponds to a different action the agent can take
=> the network should produce Q-values. They represent the expected future rewards for taking each action in the current state
Activation functions
Which activation function to choose for hidden layers?
i.e. ReLU or variants like Leaky ReLU
Loss function
Which loss function to choose?
i.e. MSE loss for Q-learning
Training Procedure
=> Train the NN (using data from self-play)
Exploration vs. Exploitation
Which exploration strategy to implement?
i.e. epsilon-greedy
=> Need to balance exploration (exploring new actions) vs. exploitation (choosing best-known actions)
=> crucial for a robust agent
TUNING
Regularization & Hyperparameters
=> experiment with regularization techniques to prevent overfitting
=> tune hyperparameters
Iterate and Experiment
=> train, evaluate, and adjust the network architecture and parameters
Evaluate and Monitor
=> monitor the agent's performance (does it make progress?)
=> evaluate and test against different opponents or strategies
i.e. let generation n play against generation n-1 or n-2 etc.
QUESTIONS
do we want a self-play algorithm?
Here's a high-level overview of how you can implement MCTS for UTTT:
Define the Game Rules:
Start by defining the rules of UTTT. Understand the game mechanics, legal moves, and how the game state transitions from one position to another.
Create a UTTT Simulator:
Build a UTTT simulator that can represent the game state and allow you to make moves, check for wins, and determine valid moves.
MCTS Components:
Implement the core components of MCTS, which include the following:
UCT Algorithm:
Consider using the Upper Confidence Bound for Trees (UCT) algorithm, which is a widely used selection strategy within MCTS. UCT balances exploration and exploitation.
Search and Decision-Making:
Create a search loop that repeatedly selects nodes to expand, simulate, and update based on MCTS until a stopping criterion (e.g., time limit or a maximum number of iterations) is reached.
Integration with UTTT Environment:
Integrate your MCTS implementation with the UTTT environment, allowing it to interact with the game and make decisions based on the search results.
Tuning Parameters:
Experiment with the parameters of the MCTS algorithm, such as exploration constant, to fine-tune its performance.
Parallelization (Optional):
If computational resources allow, consider parallelizing MCTS to speed up the search process.
MCTS Variants:
Explore advanced MCTS variants like Monte Carlo Tree Search with Upper Confidence Bounds applied to Trees (MCT-UCT) or other enhancements that may improve performance.
Testing and Evaluation:
Test your MCTS-based UTTT AI against various opponents or strategies to evaluate its performance and iteratively refine your implementation.
Optimization (Optional):
If your MCTS implementation is running too slowly, consider optimization techniques to make the search process more efficient.
Debugging and Profiling:
Use debugging tools and profiling to identify and address issues in your implementation.
LINKS
Good GitHub Repos for coding MCTS:
MCTS install via pip:
Not too helpful Repos:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.