Code Monkey home page Code Monkey logo

cs344-opponent-exploitation-poker's Introduction

CS344-Opponent-Modelling-Poker

Overleaf: https://www.overleaf.com/project/6342ef568275739c63600bb5

For refractored EFR algorithm code and deviation types please see: https://github.com/Jamesflynn1/open_spiel.

Repository Structure:

EFR.py

Main EFR implementation logic, built off of CFR.py from OpenSpiel as a baseline.

Deviation_Types/

Provides an implementation of a deviation, namely the deviate and player_deviation_reach_probability functions alongside the deviation matrix.

Additionally contains generation code of 8 deviation sets.

Notebooks/

Contains all data processing and visualisation steps beyond the EV and exploitabilty calculations

Policy/

Contains a stored .csv version of the TabularPolicy object for each algorithm.

Output/

Contains per iteration information for each run and a CFR benchmark for each run. Current iteration information consists of iteration time and cumulative policy exploitability.

How to run the EFR code:

Note: Requires a Linux environment for OpenSpiel to be installed.

1. Clone the CS344 repository locally or copy accross the following folders and files

RunEFRExperiment.py

EFR.py

StoreTabularPolicy.py

Deviation_Types/

Deviation_Types/init.py

Deviation_Types/Deviation_Sets.py

Deviation_Types/Deviation.py

Deviation_Types/Swap_Transformation.py

Policy/

Optional:

RunOpponentValue.py (calculates the expected value from Policy/ files)

RunCFR.py (a wrapper to obtain the policy file for OpenSpiel CFR)

RunMCCFR.py (a wrapper to obtain the policy file for OpenSpiel ExternalSamplingMCCFR)

Notebooks/Exploitability graphs.ipynb (data processing and visualisation for exploitability data)

Notebooks/EV Opponent.ipynb (data processing and visualisation for expected value data)

3. Install a Python3 version (tested on 3.9.13)

Download Python from https://www.python.org/downloads/ and follow installation instructions.

2. Install the Python module requirements requirements for project

python -m pip install -q -r requirements.txt

3. Run the EFR algorithm on 2 player Leduc hold'em

python RunEFRExperiment.py (Filename) (Iterations) (Deviation_Type)

Options:

Filename: the name that will be used to save the policy and iteration data files.

Iterations: the number of update iterations that EFR will perform.

Deviation_Type: The type of deviation set that EFR will use (as defined in Deviation_Types/Deviation_Sets.py) Deviation options: ("blind_action", "informed_action", "blind_cf", "informed_cf", "blind_ps", "cfps", "csps", "tips", "bhv")

Game and player number can be modified in the RunEFRExperiment.py file. See OpenSpiel/games for a list of games that can be chosen.

4. (Optional) Install Jupiter notebook to access the .ipynb notebooks

How to recreate the exploitability results

  1. Generate per iteration EFR data for EFR with all deviation options.

For all desired deviation types: "python RunEFRExperiment.py (Deviation_Type) 10000 (Deviation_Type)"

  1. Use the EV notebook to aggregate and visualise the results.

Rerun all cells in the Notebooks/EV Opponent.ipynb notebook to obtain graphs as found in the final report.

How to recreate the expected value results

  1. Generate strategy data for EFR with all deviation options.

For all desired deviation types: "python RunEFRExperiment.py (Deviation_Type) 10000 (Deviation_Type)"

All saved to Policy/, use Deviation_Type as filename.

  1. Run all opponent strategy generation code.

Current opponents (in additional to the different EFR strategies):

MCCFR (low)

MCCFR (med)

MCCFR (high)

MCCFR (higher)

All saved to Policy/

Note that strategy data from the EFR strategy types has already been generated in step 1.

  1. Generate expected value data for all (EFR type, opponent type) combinations from policy files.

"python RunOpponentValue.py" ensuring that all EFR strategies and opponent strategies are defined in this file.

Uses strategy data from /Policy.

  1. Use the EV notebook to aggregate and visualise the results

Rerun all cells in the Notebooks/EV Opponent.ipynb notebook to obtain graphs as found in the final report.

cs344-opponent-exploitation-poker's People

Contributors

jamesflynn1 avatar

Watchers

 avatar

cs344-opponent-exploitation-poker's Issues

Define what game exactly is being used for testing

What Poker ruleset? How many players? How many rounds? What kind of opponents?

ANSWER

What I am doing:

Limit Texas Hold 'em (stretch goal would be the full NL HE but this is almost certainly intractable on DCS machines and 2 days of compute time).

2 player, this is the minimum number of players that we can chose, increasing this number would again add alot of complexity to the project. I have resources limited to the DCS batch compute system.

Computer opponents, this will allow for efficient learning and circuments potential ethical issues and admin issues in administering such tests. See #17 for more details.

Create Readme

Create a good readme, this could serve as a brief introduction for the progress report

General game theory background

Should include an overview of the theoretic background for the project.

Equilibrium concepts

Best response

Incomplete information

Extensive and strategic forms of games (information sets, actions, players ect)

Add more

Final Report

The length of the report is normally between 12,000 and 18,000 words (excluding appendices), and code (if applicable) will be submitted as a separate zip file on Tabula. Check the format page for further details of the requirements for the document's format.

Technical Content:

Use of reference and other information sources.
Problem solving methodologies and tools.
Effective problem analysis.
Innovative design.
Technical achievement.
Critical and fair discussion of the subject matter.
Project Management:

Well conceived and specified project.
Effort and motivation.
Organisation and professionalism.
Communication Skills:

All components present.
Chapters well designed.
Language and technical writing skills.
Composition.
Software and hardware documentation.
Readability.
Appropriate length.

Presentation

Technical content:
Innovative analysis and design.
Methods and solutions employed.
Quality of examples used to demonstrate results.
Overall level of technical achievement.
Conclusions and suggestions for further work.
Project management:
Well conceived project.
Unforeseen problems well detected and overcome.
Progress consistent with project specification.
All necessary research, analysis and design work completed.
Communication skills:
Care and preparation of visual aids.
Quality of oral delivery.
Effective use of time and appropriate length.
Effective response to questions.

the motivation for your project (Why is it worth doing?)
the necessary background (What is already known? What has been done so far? What are the gaps? How does your project fit into this?)
your methodology (How did you go about the project? E.g.: research methods used for investigative type project; development method chosen for system development project.)
your analysis/design/solution/research (the meat of what you actually did in your project)
evaluation
both the contribution of your project and the limitations of its scope
that your project has scope for future development/has shown up further questions and directions to explore
project management
for development projects - a demo to show what your system does
your understanding of the topic and the wider landscape - answering questions.

Decide opponent types for testing

    Opponent types will be decided after my algorithm and details are nailed down and before implementation, 4 different types seems like a good range and types should be different enough to test the adaptability of agents.

Originally posted by @Jamesflynn1 in #16 (comment)

Progress Report

Write and submit the progress report

See:

https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs310/components/progress/

Assessed on:

The following criteria will be used by the assessors in marking your progress report.

Technical content:
Student is well read in the project's subject area.
Effective analysis of problems and issues.
Quality of design work.
Good choice of methods and tools.
Project management:
Well conceived project
Unforeseen problems well detected and overcome.
Progress consistent with the project specification.
All necessary research, analysis and design work completed.
Work for next term is well planned out.
Communication skills:
Basic written language skills such as spelling and grammar.
Effective composition and exposition.
Report is of an appropriate length for your particular project.

Extensive form regret minimisation

How max regret grows sublinearly (leading to an equilibrium)
How it works with all deviations subsets (unlike CFR)
Potential of equilibrium concepts to work better in some opponent cases.
Time selection regret minimisation
more??

Fix fixed point solver

I suspect many issues current arise from a broken fixed point solver, the fixed point should be constrained to be a strategy in and of itself.

Deviations and action transformation background

Action transforms
Swap transforms (internal and external)
Deviations
How this leads to memory states
Hindsight rationality
Observable Sequential Rationality and why this is required over standard rationality
Partial Sequence Deviations (correlation, deviation, recorrelation)

All deviation sets definition (start with behavioural devations)

Learning algorithms for game theory

Regret definition

Rationality and the problems with learning against an opponents strategy (exploitation)

What is no regret (no internal, no phi regret ect) learning

Regret matching

Fictitious self play

CFR and primary theorems

Compare and contrast EFR and CFR

Why I am using EFR as opposed to CFR, why haven't more people done this, what are the advantages/disadvantages.

All are good questions, look more into the Morrhill paper.

Find NLTHE

Find the module name for the Poker version and try running some experiments on it.

Random Opponent

Develop/find a Python module that returns the random opponent for the test purposes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.