facebookresearch / impact-driven-exploration Goto Github PK

View Code? Open in Web Editor NEW

123.0 9.0 26.0 7 MB

impact-driven-exploration

License: Other

Python 100.00%

impact-driven-exploration's Introduction

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

This is an implementation of the method proposed in

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

by Roberta Raileanu and Tim Rocktäschel, published at ICLR 2020.

We propose a novel type of intrinsic reward which encourges the agent to take actions that result in significant changes to its representation of the environment state.

The code includes all the baselines and ablations used in the paper.

The code was also used to run the baselines in Learning with AMIGO: Adversarially Motivated Intrinsic Goals. See the associated repo for instructions on how to reproduce the results from that paper.

Citation

If you use this code in your own work, please cite our paper:

@inproceedings{
  Raileanu2020RIDE:,
  title={{RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments}},
  author={Roberta Raileanu and Tim Rockt{\"{a}}schel},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=rkg-TJBFPB}
}

Installation

# create a new conda environment
conda create -n ride python=3.7
conda activate ride 

# install dependencies
git clone [email protected]:facebookresearch/impact-driven-exploration.git
cd impact-driven-exploration
pip install -r requirements.txt

# install MiniGrid
cd gym-minigrid
python setup.py install

Train RIDE on MiniGrid

cd impact-driven-exploration

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-MultiRoom-N7-S4-v0 --total_frames 30000000 --intrinsic_reward_coef 0.1 --entropy_cost 0.0005

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-MultiRoomNoisyTV-N7-S4-v0 --total_frames 30000000 --intrinsic_reward_coef 0.1 --entropy_cost 0.0005

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-MultiRoom-N7-S8-v0 --total_frames 30000000 --intrinsic_reward_coef 0.5 --entropy_cost 0.001

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-MultiRoom-N10-S4-v0 --total_frames 30000000 --intrinsic_reward_coef 0.1 --entropy_cost 0.0005

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-KeyCorridor-S3-R3-v0 --total_frames 30000000 --intrinsic_reward_coef 0.1 --entropy_cost 0.0005

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-ObstructedMaze-2Dlh-v0 --total_frames 100000000 --intrinsic_reward_coef 0.5 --entropy_cost 0.001

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-MultiRoom-N10-S10-v0 --total_frames 100000000 --intrinsic_reward_coef 0.5 --entropy_cost 0.001

OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-MultiRoom-N12-S10-v0 --total_frames 100000000 --intrinsic_reward_coef 0.5 --entropy_cost 0.001

To train RIDE on the other MiniGrid environments used in our paper, replace the --env argument above with each of the following:

MiniGrid-MultiRoom-N7-S4-v0
MiniGrid-MultiRoomNoisyTV-N7-S4-v0
MiniGrid-MultiRoom-N7-S8-v0
MiniGrid-MultiRoom-N10-S4-v0
MiniGrid-MultiRoom-N10-S10-v0
MiniGrid-MultiRoom-N12-S10-v0
MiniGrid-ObstructedMaze-2Dlh-v0 
MiniGrid-KeyCorridorS3R3-v0

Make sure to use the best hyperparameters for each environment, as listed in the paper.

To run different seeds for a model, change the --run_id argument.

Overview of RIDE

Results on MiniGrid

Analysis of RIDE

Acknowledgements

Our vanilla RL algorithm is based on Torchbeast, which is an open source implementation of IMPALA.

License

This code is under the CC-BY-NC 4.0 (Attribution-NonCommercial 4.0 International) license.

impact-driven-exploration's People

Contributors

Stargazers

Watchers

impact-driven-exploration's Issues

Problem reproducing MultiRoom-N10(N12)-SX

Hello,

Thanks for sharing the code for this paper, we greatly appreciated your implementations!

I'm running into a problem trying to reproduce MultiRoom-N12-S10 (and MultiRoom-N10-S10/S4)
RIDE agent is not able to learn in those environments using the hyperparameter given in the paper.

Entropy_coef : 0.001
Intrinsic reward coef : 0.5
(for N12-S10)

It's able to reach the last room sometimes but not consistently. The final score remains close to 0
Was it run on the procedurally-generated version? From the paper, it seems so, but can it be that it was run on a singleton environment?

Are there any other hyperparameters that I forgot or that were not indicated in the paper? (Maybe weight of losses etc..)

Thanks in advance

MiniGrid results appear to be using the fully observable space as opposed to the partially observable one

Just had a quick question regarding the MiniGrid experiments/code.

My initial impression is that you use the partial observation from the agent (as is the default in MiniGrid); however it looks like the code is actually using the fully-observable space?

https://github.com/facebookresearch/impact-driven-exploration/blob/main/src/utils.py#L61-L62

Can you confirm that this is the correct behavior? It seems like a very important detail to know...

Is there a typo in impact-driven-exploration/src/algos/curiosity.py ？

Line 7 of impact-driven-exploration/src/algos/curiosity.py
simport logging

# Copyright (c) Facebook, Inc. and its affiliates.
# All rights reserved.

# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

simport logging
import os
import sys
import threading
import time
import timeit

Some questions about BEBOLD

Sorry to bother you again.

I am trying to reproduce Bebold based on your code. I got the results of RIDE/RND/ICM/Count that matches RIDE paper.

Then, I try to modify your RND agent to implement Bebold. However, I cannot get the performance reported in the bebold paper.

Could you give me some advices?

I use the following code to calculate the bebold bonus

        random_embedding = random_target_network(batch['partial_obs'].to(device=flags.device))
        predicted_embedding = predictor_network(batch['partial_obs'].to(device=flags.device))

        intrinsic_rewards = torch.norm(predicted_embedding.detach() - random_embedding.detach(), dim=2, p=2)

        intrinsic_rewards = intrinsic_rewards[1:] - intrinsic_rewards[:-1]
        # ep_ind is an indicator 
        intrinsic_rewards = torch.clamp(intrinsic_rewards, 0,100000) * ep_ind * (1-dones)

        rnd_loss = flags.rnd_loss_coef * losses.compute_forward_dynamics_loss(predicted_embedding[1:], random_embedding.detach()[1:])

Here is hyper-parameter

    "args": {
        "alpha": 0.99,
        "baseline_cost": 0.5,
        "batch_size": 32,
        "checkpoint_num_frames": 10000000,
        "disable_checkpoint": false,
        "disable_cuda": false,
        "discounting": 0.99,
        "entropy_cost": 0.0005,
        "env": "MiniGrid-KeyCorridorS4R3-v0",
        "env_seed": 1,
        "epsilon": 1e-05,
        "fix_seed": false,
        "forward_loss_coef": 10.0,
        "intrinsic_reward_coef": 0.1,
        "inverse_loss_coef": 0.1,
        "learning_rate": 0.0001,
        "max_grad_norm": 40.0,
        "model": "bebold_count_rnd",
        "momentum": 0,
        "no_reward": false,
        "num_actors": 40,
        "num_buffers": 80,
        "num_input_frames": 1,
        "num_threads": 4,
        "queue_timeout": 1,
        "rnd_loss_coef": 0.1,
        "run_id": 0,
        "save_interval": 10000000,
        "seed": 0,
        "total_frames": 40000000,
        "unroll_length": 100,
        "use_fullobs_intrinsic": false,
        "use_fullobs_policy": false,
    }

Thanks in advance

serious error

Traceback (most recent call last):
File "/220019034/anaconda3/envs/torch/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/220019034/anaconda3/envs/torch/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/220019034/code/impact-driven-exploration/src/algos/ride.py", line 317, in batch_and_learn
stats = learn(model, learner_model, state_embedding_model, forward_dynamics_model,
File "/220019034/code/impact-driven-exploration/src/algos/ride.py", line 92, in learn
learner_outputs, unused_state = model(batch, initial_agent_state)
File "/220019034/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/220019034/code/impact-driven-exploration/src/models.py", line 339, in forward
action = torch.multinomial(
RuntimeError: probability tensor contains either inf, nan or element < 0
[INFO:57 ride:381 2021-07-09 05:55:38,284] After 0 frames: loss inf @ 0.0 fps. Stats:
{}

After 0 frames: loss inf @ 0.0 fps. Stats:

Thanks for your work.

I'm facing a problem when I run OMP_NUM_THREADS=1 python main.py --model ride --env MiniGrid-ObstructedMaze-2Dlh-v0
The error is :

Creating log directory: ../torchbeast-20201124-192823
Saving arguments to ../torchbeast-20201124-192823/meta.json
Saving messages to ../torchbeast-20201124-192823/out.log
Saving logs data to ../torchbeast-20201124-192823/logs.csv
Saving logs' fields to ../torchbeast-20201124-192823/fields.csv
[INFO:26888 ride:183 2020-11-24 19:28:24,057] Using CUDA.
Traceback (most recent call last):
  File "main.py", line 39, in <module>
  File "main.py", line 27, in main
  File "/home/lq/impact-driven-exploration/src/algos/ride.py", line 218, in train
  File "/home/lq/impact-driven-exploration/src/utils.py", line 138, in create_buffers
  File "/home/lq/anaconda3/envs/py3.6-impact-driven-exploration/lib/python3.6/site-packages/torch/tensor.py", line 330, in share_memory_
  File "/home/lq/anaconda3/envs/py3.6-impact-driven-exploration/lib/python3.6/site-packages/torch/storage.py", line 120, in share_memory_
RuntimeError: unable to open shared memory object </torch_26888_130155866> in read-write mode

In order to fix this problem I run python main.py --model ride --env MiniGrid-ObstructedMaze-2Dlh-v0
However, there is still a problem
The INFO that I got is :

Creating log directory: ../torchbeast-20201124-192414
Saving arguments to ../torchbeast-20201124-192414/meta.json
Saving messages to ../torchbeast-20201124-192414/out.log
Saving logs data to ../torchbeast-20201124-192414/logs.csv
Saving logs' fields to ../torchbeast-20201124-192414/fields.csv
[INFO:26128 ride:183 2020-11-24 19:24:14,147] Using CUDA.
[INFO:26149 utils:147 2020-11-24 19:24:16,179] Actor 0 started.
[INFO:26161 utils:147 2020-11-24 19:24:16,227] Actor 1 started.
[INFO:26162 utils:147 2020-11-24 19:24:16,251] Actor 2 started.
[INFO:26174 utils:147 2020-11-24 19:24:16,283] Actor 3 started.
[INFO:26186 utils:147 2020-11-24 19:24:16,333] Actor 4 started.
[INFO:26209 utils:147 2020-11-24 19:24:16,444] Actor 5 started.
[INFO:26221 utils:147 2020-11-24 19:24:16,508] Actor 6 started.
[INFO:26233 utils:147 2020-11-24 19:24:16,564] Actor 7 started.
[INFO:26234 utils:147 2020-11-24 19:24:16,606] Actor 8 started.
[INFO:26257 utils:147 2020-11-24 19:24:16,716] Actor 9 started.
[INFO:26269 utils:147 2020-11-24 19:24:16,771] Actor 10 started.
[INFO:26281 utils:147 2020-11-24 19:24:16,890] Actor 11 started.
[INFO:26293 utils:147 2020-11-24 19:24:16,952] Actor 12 started.
[INFO:26305 utils:147 2020-11-24 19:24:17,012] Actor 13 started.
[INFO:26317 utils:147 2020-11-24 19:24:17,091] Actor 14 started.
[INFO:26329 utils:147 2020-11-24 19:24:17,172] Actor 15 started.
[INFO:26330 utils:147 2020-11-24 19:24:17,215] Actor 16 started.
[INFO:26354 utils:147 2020-11-24 19:24:17,292] Actor 17 started.
[INFO:26366 utils:147 2020-11-24 19:24:17,388] Actor 18 started.
[INFO:26378 utils:147 2020-11-24 19:24:17,468] Actor 19 started.
[INFO:26379 utils:147 2020-11-24 19:24:17,503] Actor 20 started.
[INFO:26402 utils:147 2020-11-24 19:24:17,609] Actor 21 started.
[INFO:26414 utils:147 2020-11-24 19:24:17,685] Actor 22 started.
[INFO:26426 utils:147 2020-11-24 19:24:17,788] Actor 23 started.
[INFO:26438 utils:147 2020-11-24 19:24:17,865] Actor 24 started.
[INFO:26450 utils:147 2020-11-24 19:24:17,949] Actor 25 started.
[INFO:26462 utils:147 2020-11-24 19:24:18,047] Actor 26 started.
[INFO:26475 utils:147 2020-11-24 19:24:18,104] Actor 28 started.
[INFO:26463 utils:147 2020-11-24 19:24:18,109] Actor 27 started.
[INFO:26476 utils:147 2020-11-24 19:24:18,146] Actor 29 started.
[INFO:26511 utils:147 2020-11-24 19:24:18,219] Actor 30 started.
[INFO:26523 utils:147 2020-11-24 19:24:18,340] Actor 31 started.
[INFO:26524 utils:147 2020-11-24 19:24:18,383] Actor 32 started.
[INFO:26536 utils:147 2020-11-24 19:24:18,431] Actor 33 started.
[INFO:26548 utils:147 2020-11-24 19:24:18,445] Actor 34 started.
[INFO:26549 utils:147 2020-11-24 19:24:18,486] Actor 35 started.
[INFO:26572 utils:147 2020-11-24 19:24:18,547] Actor 36 started.
[INFO:26584 utils:147 2020-11-24 19:24:18,593] Actor 37 started.
[INFO:26585 utils:147 2020-11-24 19:24:18,623] Actor 38 started.
[INFO:26586 utils:147 2020-11-24 19:24:18,657] Actor 39 started.
[INFO:26128 ride:383 2020-11-24 19:24:24,687] After 0 frames: loss inf @ 0.0 fps. Stats:
{}
[INFO:26128 ride:383 2020-11-24 19:24:29,691] After 0 frames: loss inf @ 0.0 fps. Stats:
{}
[INFO:26128 ride:383 2020-11-24 19:24:34,695] After 0 frames: loss inf @ 0.0 fps. Stats:
{}
[INFO:26128 ride:383 2020-11-24 19:24:39,699] After 0 frames: loss inf @ 0.0 fps. Stats:
{}
[INFO:26128 ride:383 2020-11-24 19:24:44,703] After 0 frames: loss inf @ 0.0 fps. Stats:
{}
[INFO:26128 ride:383 2020-11-24 19:24:49,707] After 0 frames: loss inf @ 0.0 fps. Stats:
{}
...

Can you give me some suggestions?
Thanks a lot!

Cannot get more than 0 rewards

Hello,
thanks for your codes
I can't get more than 0 rewards in all environments
I test Count agent and RIDE agent. they both get any rewards. I guess RND/ICM agent are also bad.
I must be mistaken somewhere, please help me.

cuda:10.1
nvidia driver 418.56
pytorch 1.5.1
torch 1.5.1
torchvision 0.10.1
cudatoolkit: 10.1

For code:
i follow #6 (comment) , modify dtype=torch.uint8 to dtype=torch.bool

For hyperparameters:
I modify epsilon to 0.01 and keep others unchanged
I follow values in paper to modify the entropy_coef and intrinsic_reward_coef

Thanks!

facebookresearch / impact-driven-exploration Goto Github PK

impact-driven-exploration's Introduction

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Citation

Installation

Train RIDE on MiniGrid

Overview of RIDE

Results on MiniGrid

Analysis of RIDE

Acknowledgements

License

impact-driven-exploration's People

Contributors

Stargazers

Watchers

Forkers

impact-driven-exploration's Issues

Recommend Projects

Recommend Topics

Recommend Org