Code Monkey home page Code Monkey logo

hsp's Introduction

HSP

This is a repository for Hidden-utility Self-Play.

Installation

conda create -n hsp
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
cd hsp
pip install -e . 
pip install wandb icecream setproctitle gym seaborn tensorboardX slackweb psutil slackweb pyastar2d einops

We use wandb to monitor logs. See the the official website and the code for some examples.

Overcooked

Our experiments are conducted in three layouts from On the Utility of Learning about Humans for Human-AI Coordination, named Asymmetric Advantages, Coordination Ring, and Counter Circuit, and two designed layouts, named Distant Tomato and Many Orders. These layouts are named "unident_s", "random1", "random3", "distant_tomato" and "many_orders" respectively in the code.

Training

All training scripts are under directory hsp/scripts. All methods consist of two stages, in the first of which a pool of policies are trained and in the second of which an adaptive policy is trained against this policy pool.

Self-Play

To train self-play policies, change layout to one of "unident_s"(Asymmetric Advantages), "random1"(Coordination Ring), "random3"(Counter Circuit), "distant_tomato"(Distant_Tomato) and "many_orders"(Many Orders) and run ./train_overcooked_sp.sh.

FCP

In the first stage, run ./train_sp_all_S1.sh to train 12 polcicies via self-play on each layout. After the first stage training is done, run python extract_sp_S1_models.py to extract init, middle and final checkpoints of the self-play policies into the policy pool. At this step, the policy pools of FCP on all layouts should be in the directory hsp/policy_pool/LAYOUT/fcp/s1.

In the second stage, run ./train_fcp_all_S2.sh to train an adaptive policy against the policy pool for each layout.

MEP

We reimplemented Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination and achieved significant higher episode reward when paired with human proxy models than reported in original paper.

For the first stage, run ./train_mep_all_S1.sh. After training is finished, run python extract_mep_all_S1_models.py to extract checkpoints of the MEP policies into the policy pool.

For the second stage, run ./train_mep_all_S2.sh.

HSP

Important: Please make sure you finished the first stage training of MEP before the second stage of HSP.

For the first stage, run ./train_hsp_all_S1.sh. After training is finished, run python extract_hsp_S1_models.py to collect HSP policies into the policy pool.

Then run ./eval_events_all.sh to do evaluation to obtain event features for each pair of biased policy and adaptive policy in HSP. After evaluation is done, for each layout, run python hsp/greedy_select.py --layout LAYOUT --k 18 to select HSP policies in a greedy manner and generate configuration of policy pool automatically.

For the second stage, run ./train_hsp_all_S2.sh.

Evaluation

Run ./eval_overcooked.sh for evaluation. You can change the layout name, path to YAML file of population configuration and policies to evaluate in eval_overcooked.sh. To evaluate with script policies, change policy name to a string with script: as prefix, for example, script:place_onion_and_deliver_soup. For more script policies, check script_agent.py under the overcooked environment directories.

TODO: more detailed evaluation.

Publication

If you find this repository useful, please cite our paper:

@inproceedings{
yu2023learning,
title={Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased},
author={Chao Yu and Jiaxuan Gao and Weilin Liu and Botian Xu and Hao Tang and Jiaqi Yang and Yu Wang and Yi Wu},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=TrwE8l9aJzs}
}

hsp's People

Contributors

samjia2000 avatar zoeyuchao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.