Code Monkey home page Code Monkey logo

adversarially-motivated-intrinsic-goals's Introduction

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

This is an implementation for Learning with AMIGo: Adversarially Motivated Intrinsic GOals.

The method described in the AMIGo paper listed below is implemented in monobeast/minigrid/monobeast_amigo.py of this repository. Please consult that file for details of the teacher and student policies, the losses used to train them, and other aspects of training.

The student policy is created in class MinigridNet. The teacher policy is created in class Generator. The training loop is defined in train() and is divided into act() which collects the batches generated by the actors, and learn() which updates the learner based on vtrace. Training is based on the TorchBeast implementation of IMPALA (Monobeast version).

If you have any questions or feel the code needs further clarification in the form of comments, please do not hesitate to raise an issue.

Citation

If you use AMIGo in your research and found it helpful, or are comparing against our results, please consider citing the following paper:

@article{campero2020learning,
  title={Learning with AMIGo: Adversarially Motivated Intrinsic Goals},
  author={Campero, Andres and Raileanu, Roberta and K{\"u}ttler, Heinrich and Tenenbaum, Joshua B and Rockt{\"a}schel, Tim and Grefenstette, Edward},
  journal={arXiv preprint arXiv:2006.12122},
  year={2020}
}

Installation

# create a new conda environment
conda create -n amigo python=3.7
conda activate amigo

# install dependencies
git clone [email protected]:facebookresearch/adversarially-motivated-intrinsic-goals.git
cd adversarially-motivated-intrinsic-goals
pip install -r requirements.txt

Running Experiments

Train AMIGo on MiniGrid

# Run AMIGo on MiniGrid Environment
OMP_NUM_THREADS=1 python -m monobeast.minigrid.monobeast_amigo --env MiniGrid-KeyCorridorS5R3-v0 \
--num_actors 40 --modify --generator_batch_size 150 --generator_entropy_cost .05 \
--generator_threshold -.5 --total_frames 600000000 \
--generator_reward_negative -.3 --disable_checkpoint \
--savedir ./experimentMinigrid

Please be sure to use --total_frames as in the paper:
6e8 for KeyCorridorS4R3-v0, KeyCorridorS5R3-v0, ObstructedMaze-2Dlhb-v0, ObstructedMaze-1Q-v0
3e7 for KeyCorridorS3R3 and ObstructedMaze-1Dl-v0

Train the baselines on MiniGrid

We used an open sourced implementation of the exploration baselines (i.e. RIDE, RND, ICM, and Count). This code should be pulled in a separate local repository and run within a separate environment.

# create a new conda environment
conda create -n ride python=3.7
conda activate ride 

# install dependencies
git clone [email protected]:facebookresearch/impact-driven-exploration.git
cd impact-driven-exploration
pip install -r requirements.txt

To reproduce the baseline results in the paper, run:

OMP_NUM_THREADS=1 python -m python main.py --env MiniGrid-ObstructedMaze-1Q-v0 \
--intrinsic_reward_coef 0.01 --entropy_cost 0.0001

with the corresponding best values for the --intrinsic_reward_coef and --entropy_cost reported in the paper for each model.

Set --model to ride, rnd, curiosity, or count for RIDE, RND, ICM, or Count, respectively.

Set --use_fullobs_policy for using a full view of the environment as input to the policy network.

Set --use_fullobs_intrinsic for using full views of the environment to compute the intrinsic reward.

The default uses a partial view of the environment for both the policy and the intrinsic reward.

License

The code in this repository is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC 4.0).

adversarially-motivated-intrinsic-goals's People

Contributors

acampero avatar egrefen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.