pathak22 / noreward-rl Goto Github PK

[ICML 2017] TensorFlow code for Curiosity-driven Exploration for Deep Reinforcement Learning

License: Other

Shell 0.61% Python 99.39%

deep-reinforcement-learning curiosity exploration deep-learning rl deep-neural-networks mario doom self-supervised tensorflow

noreward-rl's People

Contributors

Stargazers

Watchers

Forkers

codeaudit ajaytalati jdc08161063 sunjieee radagaisus levitation jithsjoy vdt sungjinlees stevenlol ml-lab yenchih vyraun zhanghonglishanzai sungikchoi benderpan tony32769 chuanfeihuang nieguangyang shooter2062424 zhudejun1985 benjamesbabala rhythm92 zbxzc35 6676401088 g-wang gzzgz ganzhangzi cxxgtxy kimhc6028 exquisitefunction wuzhongdehua chuncbs liucx williamd4112 mimoralea rmoorman gandalfvn xinleipan wangguojie robertyin-sa pedronahum chingyaoc pde bfakhri fernandezr 1165048017 philipz midasc natashamjaques jirenu tacalvin takayukisekine chris-chris meelement universityai ajaycharan cnn-gan chenmoshushi gogozhaoya juliuskunze o7s8r6 sunxiaofeit dhernandd swissknife007 zaylind wangluo2028 matthew9671 bluejad d0048 adamstelmaszczyk jiths tianyuanyu fengshuitm nurindahpratiwi keithsw bestlgc 4skynet createamind pencilandbike roncom jevenzh qyxqyx dineshresearch collector-m alpslee fengsiyu kastnerkyle hunanuniversity mqqq1024 1234565432 lyffly davegeneral y-c zdx3578 jarnar nke001 winwinjjiang fangego zhongruihu

noreward-rl's Issues

A question about the designHead weights in LstmPolicy and StateActionPredictor classes

Hello, thanks for your great work!

I noticed that in /src/a3c.py line 271-277,
self.network = LSTMPolicy(env.observation_space.shape, numaction, designHead)
is defined within the scope "local", and
self.ap_network = StatePredictor(env.observation_space.shape, numaction, designHead, unsupType)
is defined within the scope "predictor" under the scope "local". I think (as I tested MNIST in a simple CNN) this indicates that the designHead weights used in both classes are different (even though designHead structures are the same) since they are under different scope.

In LstmPolicy class, the inputs are fed into the designHead and the outputs are fed into lstm for policy and value fcn prediction.
However in StatePredictor/StateActionPredictor class, the forward and inverse models are based on the designHead with different weights as I mentioned LstmPolicy and StatePredictor are within different scopes.

I was wondering here /src/a3c.py line 271-277, why LstmPolicy and StatePredictor are not under the same scope so their designHead would share weights. In other words, if they are using different weights, it seems that the forward and inverse models are trained regardless of the A3C policy and value function, while A3C policy/value fcn are affected by the forward loss as intrinsic reward.

Thank you,

Sessions in tmux don't use the "curiosity" virtualenv

First of all, thank you (and your team) for great research and for accompanying it with code.

On the issue:

Running train.py will start multiple tmux sessions and there the "curiosity" virtualenv is not used. So unless one installs the same dependencies on the host (but then why to use virtualenv at all), it will error with missing imports or mismatched versions errors.

Did you miss to add instructions that set up the "curiosity" virtualenv in new tmux sessions? Or I'm missing something?

Running setup multiple times

hello @pathak22 ,
The work done by you people on RL rewards is really appreciated. And like your title I am also very curious in learning the proposed system. In that process now I came out with an issue that is like this after running the setup for more times.
issue:
bash: /home/swagking0/noreward-rl/curiosity/bin/pip: /home/swagking0/noreward-rl/curiosity/bin/python2: bad interpreter: Too many levels of symbolic links

I request to have a look at it and help me to overcome this issue and help me to continue work on your proposed paper deeply

Reproduce the result of Figure 5(a)

Hi,
Thank you for the code.

But when I want to reproduce the result of Figure 5(a) in the paper, I find the results of A3C and ICM+A3C are very similar.

I use child mode because I am not familiar with tmux.

The only difference between the two algorithms in the code is --unsup in train.py. ICM+A3C is action while A3C is None. Is that right?

Thank you in advance.

extrinsic and intrinsic combination

Hello, I am trying to implement ICM in PPO with both extrinsic and intrinsic combination. I have seen in few repos where they weight out an extrinsic reward more than intrinsic i.e. combine_reward = (1-int_coef) * rewards + int_coef * intrinsic_reward whereint_coeff = 0.01which reduces the effect of intrinsic rewards significantly. Seeing your paper, you have nowhere mentioned this sort of equation for both the rewards. I wonder if you can tell me that the equation mentioned above can be implemented for a dual reward setting.

train.py tmux not found

I am new to this. When I run the train.py, it reports like this:

(curiosity) yty@yty-GL502VML:~/noreward-rl/src$ python train.py --default --env-id doom
Executing the following commands:
mkdir -p tmp/doom
echo /home/yty/noreward-rl/curiosity/bin/python train.py --default --env-id doom > tmp/doom/cmd.sh
kill -9 $( lsof -i:12345 -t ) > /dev/null 2>&1
kill -9 $( lsof -i:12222-12242 -t ) > /dev/null 2>&1
tmux kill-session -t a3c
tmux new-session -s a3c -n ps -d bash
tmux new-window -t a3c -n w-0 bash
tmux new-window -t a3c -n w-1 bash
tmux new-window -t a3c -n w-2 bash
tmux new-window -t a3c -n w-3 bash
tmux new-window -t a3c -n w-4 bash
tmux new-window -t a3c -n w-5 bash
tmux new-window -t a3c -n w-6 bash
tmux new-window -t a3c -n w-7 bash
tmux new-window -t a3c -n w-8 bash
tmux new-window -t a3c -n w-9 bash
tmux new-window -t a3c -n w-10 bash
tmux new-window -t a3c -n w-11 bash
tmux new-window -t a3c -n w-12 bash
tmux new-window -t a3c -n w-13 bash
tmux new-window -t a3c -n w-14 bash
tmux new-window -t a3c -n w-15 bash
tmux new-window -t a3c -n w-16 bash
tmux new-window -t a3c -n w-17 bash
tmux new-window -t a3c -n w-18 bash
tmux new-window -t a3c -n w-19 bash
tmux new-window -t a3c -n tb bash
tmux new-window -t a3c -n htop bash
sleep 1
tmux send-keys -t a3c:ps 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name ps' Enter
tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 0 --remotes 1' Enter
tmux send-keys -t a3c:w-1 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 1 --remotes 1' Enter
tmux send-keys -t a3c:w-2 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 2 --remotes 1' Enter
tmux send-keys -t a3c:w-3 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 3 --remotes 1' Enter
tmux send-keys -t a3c:w-4 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 4 --remotes 1' Enter
tmux send-keys -t a3c:w-5 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 5 --remotes 1' Enter
tmux send-keys -t a3c:w-6 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 6 --remotes 1' Enter
tmux send-keys -t a3c:w-7 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 7 --remotes 1' Enter
tmux send-keys -t a3c:w-8 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 8 --remotes 1' Enter
tmux send-keys -t a3c:w-9 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 9 --remotes 1' Enter
tmux send-keys -t a3c:w-10 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 10 --remotes 1' Enter
tmux send-keys -t a3c:w-11 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 11 --remotes 1' Enter
tmux send-keys -t a3c:w-12 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 12 --remotes 1' Enter
tmux send-keys -t a3c:w-13 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 13 --remotes 1' Enter
tmux send-keys -t a3c:w-14 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 14 --remotes 1' Enter
tmux send-keys -t a3c:w-15 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 15 --remotes 1' Enter
tmux send-keys -t a3c:w-16 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 16 --remotes 1' Enter
tmux send-keys -t a3c:w-17 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 17 --remotes 1' Enter
tmux send-keys -t a3c:w-18 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 18 --remotes 1' Enter
tmux send-keys -t a3c:w-19 'CUDA_VISIBLE_DEVICES= python worker.py --log-dir tmp/doom --env-id doom --num-workers 20 --psPort 12222 --envWrap --designHead universe --unsup action --noLifeReward --job-name worker --task 19 --remotes 1' Enter
tmux send-keys -t a3c:tb 'tensorboard --logdir tmp/doom --port 12345' Enter
tmux send-keys -t a3c:htop htop Enter

sh: 5: tmux: not found
sh: 6: tmux: not found
sh: 7: tmux: not found
sh: 8: tmux: not found
sh: 9: tmux: not found
sh: 10: tmux: not found
sh: 11: tmux: not found
sh: 12: tmux: not found
sh: 13: tmux: not found
sh: 14: tmux: not found
sh: 15: tmux: not found
sh: 16: tmux: not found
sh: 17: tmux: not found
sh: 18: tmux: not found
sh: 19: tmux: not found
sh: 20: tmux: not found
sh: 21: tmux: not found
sh: 22: tmux: not found
sh: 23: tmux: not found
sh: 24: tmux: not found
sh: 25: tmux: not found
sh: 26: tmux: not found
sh: 27: tmux: not found
sh: 28: tmux: not found
sh: 30: tmux: not found
sh: 31: tmux: not found
sh: 32: tmux: not found
sh: 33: tmux: not found
sh: 34: tmux: not found
sh: 35: tmux: not found
sh: 36: tmux: not found
sh: 37: tmux: not found
sh: 38: tmux: not found
sh: 39: tmux: not found
sh: 40: tmux: not found
sh: 41: tmux: not found
sh: 42: tmux: not found
sh: 43: tmux: not found
sh: 44: tmux: not found
sh: 45: tmux: not found
sh: 46: tmux: not found
sh: 47: tmux: not found
sh: 48: tmux: not found
sh: 49: tmux: not found
sh: 50: tmux: not found
sh: 51: tmux: not found
sh: 52: tmux: not found
Use tmux attach -t a3c to watch process output
Use tmux kill-session -t a3c to kill the job
Point your browser to http://localhost:12345 to see Tensorboard

Could anyone help me? Thanks in advance.

Use this model with demo frames

Hello,

I am trying to modify this code so as to add a pretraining step which uses demo images of human playing the game. I've been trying to see where exactly should I add this step but I'm kinda stumped. Any suggestions as to how you would see this being done?

Failed to build doom-py

$pip install -r requirements.txt
Requirement already satisfied: atari-py==0.1.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 1)) (0.1.1)
Requirement already satisfied: attrs==17.2.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 2)) (17.2.0)
Requirement already satisfied: autobahn==17.6.2 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 3)) (17.6.2)
Requirement already satisfied: Automat==0.6.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 4)) (0.6.0)
Requirement already satisfied: backports.ssl-match-hostname==3.5.0.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 5)) (3.5.0.1)
Requirement already satisfied: certifi==2017.4.17 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 6)) (2017.4.17)
Requirement already satisfied: chardet==3.0.4 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 7)) (3.0.4)
Requirement already satisfied: constantly==15.1.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 8)) (15.1.0)
Requirement already satisfied: docker-py==1.10.3 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 9)) (1.10.3)
Requirement already satisfied: docker-pycreds==0.2.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 10)) (0.2.1)
Collecting doom-py==0.0.15
Using cached doom-py-0.0.15.tar.gz (4.4 MB)
Requirement already satisfied: fastzbarlight==0.0.14 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 12)) (0.0.14)
Requirement already satisfied: funcsigs==1.0.2 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 13)) (1.0.2)
Obtaining go_vncdriver from git+https://github.com/openai/go-vncdriver.git@33bd0dd9620e97acd9b4e559bca217df09ba89e6#egg=go_vncdriver (from -r src/requirements.txt (line 14))
Skipping because already up-to-date.
Obtaining gym from git+https://github.com/openai/gym.git@6f277090ed3323009a324ea31d00363afd8dfb3a#egg=gym (from -r src/requirements.txt (line 15))
Skipping because already up-to-date.
Obtaining gym_pull from git+https://github.com/pathak22/gym-pull.git@589039c29567c67fb3d5c0a315806419e0999415#egg=gym_pull (from -r src/requirements.txt (line 16))
Skipping because already up-to-date.
Requirement already satisfied: hyperlink==17.2.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 17)) (17.2.1)
Requirement already satisfied: idna==2.5 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 18)) (2.5)
Requirement already satisfied: incremental==17.5.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 19)) (17.5.0)
Requirement already satisfied: ipaddress==1.0.18 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 20)) (1.0.18)
Requirement already satisfied: mock==2.0.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 21)) (2.0.0)
Requirement already satisfied: numpy==1.13.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 22)) (1.13.1)
Requirement already satisfied: olefile==0.44 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 23)) (0.44)
Requirement already satisfied: pbr==3.1.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 24)) (3.1.1)
Requirement already satisfied: Pillow==4.2.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 25)) (4.2.1)
Processing /home/l/.cache/pip/wheels/ef/15/b1/c1764316e4c096e42d9dafde396c8146c951a8d5c7dbdf69d4/ppaquette_gym_doom-0.0.3-py2-none-any.whl
Obtaining ppaquette_gym_super_mario from git+https://github.com/ppaquette/gym-super-mario.git@2e5ee823b6090af3f99b1f62c465fc4b033532f4#egg=ppaquette_gym_super_mario (from -r src/requirements.txt (line 27))
Skipping because already up-to-date.
Collecting protobuf==3.1.0
Using cached protobuf-3.1.0-py2.py3-none-any.whl (339 kB)
Requirement already satisfied: pyglet==1.2.4 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 29)) (1.2.4)
Processing /home/l/.cache/pip/wheels/25/8d/b9/a988dfa73c8212184401a176454e275bddba4a88a126852f70/PyOpenGL-3.1.0-py2-none-any.whl
Processing /home/l/.cache/pip/wheels/b2/35/55/b4e9912486ca285daabe3f425eb30b115b96831fcfa2aeca7a/PyYAML-3.12-cp27-cp27mu-linux_x86_64.whl
Requirement already satisfied: requests>=2.20.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 32)) (2.24.0)
Collecting scipy==0.19.1
Using cached scipy-0.19.1-cp27-cp27mu-manylinux1_x86_64.whl (45.0 MB)
Requirement already satisfied: six==1.10.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 34)) (1.10.0)
Collecting tensorflow==0.12.0rc1
Using cached tensorflow-0.12.0rc1-cp27-cp27mu-manylinux1_x86_64.whl (43.1 MB)
Processing /home/l/.cache/pip/wheels/ad/98/8e/041ed967dab815cddd6cf713801e1d2d6545498559bb6d7a7c/Twisted-17.5.0-cp27-cp27mu-linux_x86_64.whl
Requirement already satisfied: txaio==2.8.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 37)) (2.8.0)
Processing /home/l/.cache/pip/wheels/15/7a/1b/0804c274252cdae19c6556ec9c569585e2d71f2c811cd8b322/ujson-1.35-cp27-cp27mu-linux_x86_64.whl
Obtaining universe from git+https://github.com/openai/universe.git@e8037a103d8871a29396c39b2a58df439bde3380#egg=universe (from -r src/requirements.txt (line 39))
Skipping because already up-to-date.
Requirement already satisfied: urllib3==1.21.1 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 40)) (1.21.1)
Requirement already satisfied: websocket-client==0.44.0 in ./curiosity/lib/python2.7/site-packages (from -r src/requirements.txt (line 41)) (0.44.0)
Collecting zope.interface==4.4.2
Using cached zope.interface-4.4.2-cp27-cp27mu-manylinux1_x86_64.whl (170 kB)
Requirement already satisfied: setuptools in ./curiosity/lib/python2.7/site-packages (from protobuf==3.1.0->-r src/requirements.txt (line 28)) (44.1.1)
Requirement already satisfied: wheel in ./curiosity/lib/python2.7/site-packages (from tensorflow==0.12.0rc1->-r src/requirements.txt (line 35)) (0.34.2)
Building wheels for collected packages: doom-py
Building wheel for doom-py (setup.py): started
Building wheel for doom-py (setup.py): finished with status 'error'
Running setup.py clean for doom-py
Failed to build doom-py
Installing collected packages: doom-py, go-vncdriver, gym, gym-pull, ppaquette-gym-doom, ppaquette-gym-super-mario, protobuf, PyOpenGL, PyYAML, scipy, tensorflow, zope.interface, Twisted, ujson, universe
Running setup.py install for doom-py: started
Running setup.py install for doom-py: still running...
Running setup.py install for doom-py: finished with status 'error'

Does this work on Atari?

The paper goes over Mario and Doom, but if any experiments were done on Atari, it'd nice to know.

Error when restoring model

After training the model for 1 day, I found that there was no meta graph file, so I couldn't restore it in src/demo.py. So I modified the src/worker.py:23, in the class of FastSave(), from super(FastSaver, self).save(sess, save_path, ..., False), to super(FastSaver, self).save(sess, save_path, ... , True). And this did create the meta graph file.

But when I used the demo.py to restore it, I got the error:

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'global/predictor/flast/b/Adam_1': Operation was explicitly assigned to /job:ps/task:0/device:CPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
	 [[Node: global/predictor/flast/b/Adam_1 = VariableV2[_class=["loc:@global/predictor/flast/b"], container="", dtype=DT_FLOAT, shape=[288], shared_name="", _device="/job:ps/task:0/device:CPU:0"]()]]

Why the meta graph was explicitly assigned the operation to /job:ps/task:0/device:CPU:0 ?
And what can I do to solve it?

Tryinf to run simple atari pong

Ran got a checkpoint but can no visualization the inference failed see below. tried running with more than one worker same error.

python train.py --env-id Pong-ram-v0 --num-workers 1  --visualise 


('w-0', "tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= /home/rjn/projects/RL/noreward-rl/curiosity/bin/python worker.py --log-dir /tmp/Pong-ram-v0 --env-id Pong-ram-v0 --num-workers 1 --psPort 12222 --visualise --designHead universe --job-name worker --task 0 --remotes 1' Enter")
Executing the following commands:
mkdir -p /tmp/Pong-ram-v0
echo /home/rjn/projects/RL/noreward-rl/curiosity/bin/python train.py --env-id Pong-ram-v0 --num-workers 1 --visualise > /tmp/Pong-ram-v0/cmd.sh


kill -9 $( lsof -i:12345 -t ) > /dev/null 2>&1
kill -9 $( lsof -i:12222-12223 -t ) > /dev/null 2>&1
tmux kill-session -t a3c
tmux new-session -s a3c -n ps -d bash
tmux new-window -t a3c -n w-0 bash
tmux new-window -t a3c -n tb bash
tmux new-window -t a3c -n htop bash
sleep 1
tmux send-keys -t a3c:ps 'CUDA_VISIBLE_DEVICES= /home/rjn/projects/RL/noreward-rl/curiosity/bin/python worker.py --log-dir /tmp/Pong-ram-v0 --env-id Pong-ram-v0 --num-workers 1 --psPort 12222 --visualise --designHead universe --job-name ps' Enter
tmux send-keys -t a3c:w-0 'CUDA_VISIBLE_DEVICES= /home/rjn/projects/RL/noreward-rl/curiosity/bin/python worker.py --log-dir /tmp/Pong-ram-v0 --env-id Pong-ram-v0 --num-workers 1 --psPort 12222 --visualise --designHead universe --job-name worker --task 0 --remotes 1' Enter
tmux send-keys -t a3c:tb 'tensorboard --logdir /tmp/Pong-ram-v0 --port 12345' Enter
tmux send-keys -t a3c:htop htop Enter

Use `tmux attach -t a3c` to watch process output
Use `tmux kill-session -t a3c` to kill the job
Point your browser to http://localhost:12345 to see Tensorboard

(curiosity) rjn@rjn-Oryx-Pro:~/projects/RL/noreward-rl/src$ tmux kill-session -t a3c

(curiosity) rjn@rjn-Oryx-Pro:~/projects/RL/noreward-rl/src$ python src/inference.py --default --env-id Pong-ram-v0 --record
python: can't open file 'src/inference.py': [Errno 2] No such file or directory
(curiosity) rjn@rjn-Oryx-Pro:~/projects/RL/noreward-rl/src$ python inference.py --default --env-id Pong-ram-v0 --record
[2018-11-07 10:50:31,304] Writing logs to file: /tmp/universe-5865.log
[2018-11-07 10:50:31,545] Making new env: Pong-ram-v0
[2018-11-07 10:50:31,696] Creating monitor directory /tmp/Pong-ram-v0/inference
Using universe head design
[2018-11-07 10:50:31,959] Trainable vars:
[2018-11-07 10:50:31,959]   global/l1/W:0 (3, 3, 1, 32)
[2018-11-07 10:50:31,959]   global/l1/b:0 (1, 1, 1, 32)
[2018-11-07 10:50:31,959]   global/l2/W:0 (3, 3, 32, 32)
[2018-11-07 10:50:31,959]   global/l2/b:0 (1, 1, 1, 32)
[2018-11-07 10:50:31,960]   global/l3/W:0 (3, 3, 32, 32)
[2018-11-07 10:50:31,960]   global/l3/b:0 (1, 1, 1, 32)
[2018-11-07 10:50:31,960]   global/l4/W:0 (3, 3, 32, 32)
[2018-11-07 10:50:31,960]   global/l4/b:0 (1, 1, 1, 32)
[2018-11-07 10:50:31,960]   global/RNN/BasicLSTMCell/Linear/Matrix:0 (544, 1024)
[2018-11-07 10:50:31,960]   global/RNN/BasicLSTMCell/Linear/Bias:0 (1024,)
[2018-11-07 10:50:31,960]   global/value/w:0 (256, 1)
[2018-11-07 10:50:31,960]   global/value/b:0 (1,)
[2018-11-07 10:50:31,960]   global/action/w:0 (256, 6)
[2018-11-07 10:50:31,960]   global/action/b:0 (6,)
[2018-11-07 10:50:31,962] Inference events directory: /tmp/Pong-ram-v0/inference
[2018-11-07 10:50:31,963] Initializing all parameters.
[2018-11-07 10:50:32,058] Restoring trainable global parameters.
[2018-11-07 10:50:32,067] Restored model was trained for 0.00M global steps
[2018-11-07 10:50:32,067] Called reset on <Unvectorize<VectorizeFilter[<class 'envs.DiagnosticsInfoI'>]<AtariRescale42x42<Vectorize<Monitor<TimeLimit<AtariEnv instance>>>>>>> before configuring. Configuring automatically with default arguments
[2018-11-07 10:50:32,075] Starting new video recorder writing to /tmp/Pong-ram-v0/inference/openaigym.video.0.5865.video000000.mp4
Traceback (most recent call last):
  File "inference.py", line 216, in <module>
    tf.app.run()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "inference.py", line 213, in main
    inference(args)
  File "inference.py", line 88, in inference
    last_state = env.reset()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/src/gym/gym/core.py", line 123, in reset
    observation = self._reset()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/src/universe/universe/wrappers/vectorize.py", line 46, in _reset
    observation_n = self.env.reset()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/src/gym/gym/core.py", line 123, in reset
    observation = self._reset()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/src/universe/universe/vectorized/vectorize_filter.py", line 29, in _reset
    observation_n = self.env.reset()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/src/gym/gym/core.py", line 123, in reset
    observation = self._reset()
  File "/home/rjn/projects/RL/noreward-rl/curiosity/src/gym/gym/core.py", line 376, in _reset
    return self._observation(observation)
  File "/home/rjn/projects/RL/noreward-rl/src/envs.py", line 294, in _observation
    return [_process_frame42(observation) for observation in observation_n]
  File "/home/rjn/projects/RL/noreward-rl/src/envs.py", line 276, in _process_frame42
    frame = frame[34:34+160, :160]
IndexError: too many indices for array
[2018-11-07 10:50:32,201] Finished writing results. You can upload them to the scoreboard via gym.upload('/tmp/Pong-ram-v0/inference')

Training Code

Hi Deepak,

The paper is great. Any chance of getting access to the training code?

Thanks!

Simon

Can it work for robotic with continuous action space?

Hi, I am 2nd year of graduate student at Seoul National university, Seoul, Korea.
I am studying RL and want to apply your idea to make a contribution for controlling a robot.
I am just curious the possibility of applying it with DDPG, so it can work for robotics with continuous action space.

I am considering two area of robotics: one is manipulator which has task with sparse reward and another is a navigation.

How do you think?
If you don't mind, I want to have your advise.

WAD files

Hi @pathak22,

very interesting paper. I have been trying to replicate your results in pytorch using Vizdoom. You mention that by default your code runs the "dense-reward" setup, with only 15 possible starting locations. In the original my_way_home.wad file provided by Vizdoom, however, all the 17 locations are available.

Could you please explain me where could I get the "dense-reward" wad file and the 'sparse' wad file to compare? I would like to avoid having to install doom-builder and modify the files...

Thanks,

Performance of A3C on benchmark

Hi Deepak,

Thanks for sharing the code of your excellent work!
May I know the performance of your LSTMPolicy on RL benchmark like Atari Breakout?

James

Tensor Board Version

What version of Tensorboard was used? I can't figure out which version will be compatible with the rest of the software dependencies and none is specified in the requirements.txt

安装报错

Running build with OpenGL rendering.
Building with OpenGL: GOPATH=/home/haotong/noreward-rl/curiosity/src/go-vncdriver/.build go build -buildmode=c-shared -o go_vncdriver.so github.com/openai/go-vncdriver. (Set GO_VNCDRIVER_NOGL to build without OpenGL.)
Traceback (most recent call last):
File "curiosity/src/go-vncdriver/build.py", line 121, in
main()
File "curiosity/src/go-vncdriver/build.py", line 20, in main
build()
File "curiosity/src/go-vncdriver/build.py", line 112, in build
if not build_gl():
File "curiosity/src/go-vncdriver/build.py", line 100, in build_gl
return not subprocess.call(cmd.split())
File "/usr/lib/python2.7/subprocess.py", line 523, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
求解，这是什么原因报的错？

The program seems to get stuck ?

Hello, Pathak. I tried to reproduce your codes in Mario env. But the program seems to get stuck: Mario does not act and the output just remains like this for about ten minutes :

balabala

Using universe head design
Using universe head design
Optimizer: ADAM with lr: 0.000100
Input observation shape:  (224, 256, 3)

Doing hard mario fceux reset (40 seconds wait) !
[2017-10-15 17:53:37,185] Starting training at gobal_step=0

I add some outputs in a3c.py and I find the program was stuck in
self.queue.put(next(rollout_provider), timeout=600.0) (line 146)

Have you met similar problems ? Or it is just fine ? Could you tell me how to tackle it ?

Tried to run demo.py got syntax error

File "demo.py", line 8, in
from envs import create_env
File "/Users/noreward-rl/src/envs.py", line 8, in
import universe
File "/anaconda3/lib/python3.7/site-packages/universe/init.py", line 22, in
from universe import error, envs
File "/anaconda3/lib/python3.7/site-packages/universe/envs/init.py", line 1, in
import universe.envs.vnc_env
File "/anaconda3/lib/python3.7/site-packages/universe/envs/vnc_env.py", line 11, in
from universe.envs import diagnostics
File "/anaconda3/lib/python3.7/site-packages/universe/envs/diagnostics.py", line 94
async = self.qr_pool.apply_async(self.method, (self._last_img, time.time(), available_at))
^
SyntaxError: invalid syntax

Does temporal autoencoder improve performance/speed?

https://github.com/pathak22/noreward-rl/blob/master/src/model.py#L49-L111

I didn't see it mentioned in paper.

How to disable the fceux GUI when training Mario?

Hello, I think your idea is so cool, and I want to have a try by myself. However, when I train Mario, the fceux GUI will appear, which will slow down my training speed. Besides, I can not run my codes on server due to the fceux GUI. So I wonder how did you avoid this in your codes?

Update Code to Tensorflow 2.0

Hi Researchers of Curiosity Learning!

Can you please update the source code into Tensorflow 2.0 so that it is understandable? Thanks!

OrdinaryHacker101

a GAN idea

Thank you for the work. I recently start working on reinforcement learning of mathematical research (with the formal language and deduction system of a proof assistant as the environment); it's not straightforward to design a proper reward, but novelty is certainly a good measure of progress, and your work is inspiring.

One idea I have, which I also intend to apply in my project, is about the measurement of prediction error; it seems to me that some GAN idea is applicable here. The predictor can be seen as a generator, so how about training a discriminator (conditioned on the current state) with the predicted outcomes as negative samples and the actual outcomes as positive samples? Maybe then you can just predict the pixels, and the discriminator will extract features automatically and ignore any essentially unpredictable features, like the exact locations of tree leaves in a breeze. Also it would be unnecessary to distinguish between things that affect or can be controlled by the agent and things that do not.

I am a beginner in reinforcement learning apart from my participation in the Leela Zero project. I haven't looked much into the details of the various algorithms and NN architectures, and just want to get some feedback about whether the general idea is promising. Thank you in advance!

RuntimeError: go_vncdriver must be imported before tensorflow

(curiosity) l@DESKTOP-82FI76C:/mnt/e/noreward-rl/src$ python3 demo.py --env-id SuperMarioBros-1-1-v0 --ckpt ../models/mario/mario_ICM >log.txt
/home/l/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/l/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/l/.local/lib/python3.6/site-packages/universe/runtimes/init.py:7: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
spec = yaml.load(f)
Traceback (most recent call last):
File "demo.py", line 8, in
from envs import create_env
File "/mnt/e/noreward-rl/src/envs.py", line 8, in
import universe
File "/home/l/.local/lib/python3.6/site-packages/universe/init.py", line 22, in
from universe import error, envs
File "/home/l/.local/lib/python3.6/site-packages/universe/envs/init.py", line 1, in
import universe.envs.vnc_env
File "/home/l/.local/lib/python3.6/site-packages/universe/envs/vnc_env.py", line 19, in
import go_vncdriver
File "/home/l/.local/lib/python3.6/site-packages/go_vncdriver/init.py", line 7, in
raise RuntimeError('go_vncdriver must be imported before tensorflow')
RuntimeError: go_vncdriver must be imported before tensorflow

Inconsistent actions between train and inference on Mario

I trained policy for mario environment with
python train.py --default --env-id mario --noReward
And observed quite high external reward during the training:

[2019-03-15 01:24:17,798] True Game terminating: env_episode_reward=0.648666666667 episode_length=669
Episode finished. Sum of shaped rewards: 0.00. Length: 669. Bonus: 4.1677.

However, when I try to run the policy with inference.py with the following
python inference.py --env-id SuperMarioBros-1-1-v0 --default --log-dir ../mario/train
the agent continuously keeps trying to go left, which makes me think that the action space for the train and the inference is inconsistent (somehow swapped).

Is there a way to fix it?

Feature normalization ?

Hello, I just read the paper today, and there are still two points that remains unclear to me.
I looked at the code to try understanding it better but it still remains not clear.

The first point :
In model.py the features function transforming the input state into feature space are defined in nipsHead, universeHead, ...
In these definitions and their usage, I see no trace of normalization (something like l2 normalize).
I am expecting to see a normalization because it seems very easy for the network to cheat. If it want to maximize the reward, it just have to scale the features up. (And scale down in the inverse model to not be penalized).

The second point :
It seems to me that every time the parameters of the features function are modified, the intrinsic rewards therefore the rewards for the whole episode are modified. Therefore we need to recompute the generalized advantages for the whole episode. Does this mean that we must process episodes in their entirety ? How does it play with experience replay ? Is there an approximation to avoid recomputing the advantages after an update ?

Thanks.

I have error when executing the code.

The below is the error on worker-0 of tmux when i executed the code:

icsl@icsl:~/Downloads/noreward-rl-master/src$ CUDA_VISIBLE_DEVICES= /home/icsl/Downloads/noreward-rl-master/curiosity/bin/python worker.py --log-dir tmp/doom --env-id doom --num-workers 3 --psPort 12222 --designHead universe --job-name worker --task 0 --remotes 1
[2017-10-23 20:27:27,623] Writing logs to file: /tmp/universe-5969.log
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12222}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12223, 1 -> 127.0.0.1:12224, 2 -> 127.0.0.1:12225}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:12223
[2017-10-23 20:27:27,649] Making new env: ppaquette/DoomMyWayHome-v0
Using universe head design
Using universe head design
Optimizer: ADAM with lr: 0.000100
Input observation shape: (120, 160, 3)
[2017-10-23 20:27:29,646] Trainable vars:
[2017-10-23 20:27:29,646] global/l1/W:0 (3, 3, 3, 32)
[2017-10-23 20:27:29,646] global/l1/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,646] global/l2/W:0 (3, 3, 32, 32)
[2017-10-23 20:27:29,647] global/l2/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,647] global/l3/W:0 (3, 3, 32, 32)
[2017-10-23 20:27:29,647] global/l3/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,647] global/l4/W:0 (3, 3, 32, 32)
[2017-10-23 20:27:29,647] global/l4/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,647] global/RNN/BasicLSTMCell/Linear/Matrix:0 (2816, 1024)
[2017-10-23 20:27:29,647] global/RNN/BasicLSTMCell/Linear/Bias:0 (1024,)
[2017-10-23 20:27:29,647] global/value/w:0 (256, 1)
[2017-10-23 20:27:29,647] global/value/b:0 (1,)
[2017-10-23 20:27:29,647] global/action/w:0 (256, 4)
[2017-10-23 20:27:29,647] global/action/b:0 (4,)
[2017-10-23 20:27:29,648] local/l1/W:0 (3, 3, 3, 32)
[2017-10-23 20:27:29,648] local/l1/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,648] local/l2/W:0 (3, 3, 32, 32)
[2017-10-23 20:27:29,648] local/l2/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,648] local/l3/W:0 (3, 3, 32, 32)
[2017-10-23 20:27:29,648] local/l3/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,648] local/l4/W:0 (3, 3, 32, 32)
[2017-10-23 20:27:29,648] local/l4/b:0 (1, 1, 1, 32)
[2017-10-23 20:27:29,648] local/RNN/BasicLSTMCell/Linear/Matrix:0 (2816, 1024)
[2017-10-23 20:27:29,648] local/RNN/BasicLSTMCell/Linear/Bias:0 (1024,)
[2017-10-23 20:27:29,648] local/value/w:0 (256, 1)
[2017-10-23 20:27:29,649] local/value/b:0 (1,)
[2017-10-23 20:27:29,649] local/action/w:0 (256, 4)
[2017-10-23 20:27:29,649] local/action/b:0 (4,)
[2017-10-23 20:27:29,649] Events directory: tmp/doom/train_0
[2017-10-23 20:27:30,381] Starting session. If this hangs, we're mostly likely waiting to connect to the parameter server. One common cause is that the parameter server DNS name isn't resolving yet, or is misspecified.
I tensorflow/core/distributed_runtime/master_session.cc:993] Start master session cce4496f26699077 with config:
device_filters: "/job:ps"
device_filters: "/job:worker/task:0/cpu:0"

Traceback (most recent call last):
File "worker.py", line 188, in
tf.app.run()
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "worker.py", line 180, in main
run(args, server)
File "worker.py", line 95, in run
with sv.managed_session(server.target, config=config) as sess, sess.as_default():
File "/usr/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 974, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 802, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 963, in managed_session
start_standard_services=start_standard_services)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 720, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 227, in prepare_session
config=config)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 173, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1388, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [3,3,3,32] rhs shape= [3,3,4,32]
[[Node: save/Assign_17 = Assign[T=DT_FLOAT, _class=["loc:@global/l1/W"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](global/l1/W/Adam_1, save/RestoreV2_17)]]

Caused by op u'save/Assign_17', defined at:
File "worker.py", line 188, in
tf.app.run()
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "worker.py", line 180, in main
run(args, server)
File "worker.py", line 49, in run
saver = FastSaver(variables_to_save)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1000, in init
self.build()
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1030, in build
restore_sequentially=self._restore_sequentially)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 624, in build
restore_sequentially, reshape)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 373, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 130, in restore
self.op.get_shape().is_fully_defined())
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/icsl/Downloads/noreward-rl-master/curiosity/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [3,3,3,32] rhs shape= [3,3,4,32]
[[Node: save/Assign_17 = Assign[T=DT_FLOAT, _class=["loc:@global/l1/W"], use_locking=true, validate_shape=true, _device="/job:ps/replica:0/task:0/cpu:0"](global/l1/W/Adam_1, save/RestoreV2_17)]]

What should I do...?

Question: I want to run a single worker

I want to run a single worker but it seems to hang -- message below.

python src/worker.py --env-id Pong-v0 --num-workers 1 --visualise --envWrap --designHead universe
after this the process can not be interrupted with ctl-c; it does not appear to be running.

[2018-11-07 00:04:54,858] Writing logs to file: /tmp/universe-28673.log
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:12222}
I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12223}
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:211] Started server with target: grpc://localhost:12223
[2018-11-07 00:04:54,868] Making new env: Pong-v0
Using universe head design
Using universe head design
Optimizer: ADAM with lr: 0.000100
Input observation shape:  (42, 42, 1)
[2018-11-07 00:04:56,771] Trainable vars:
[2018-11-07 00:04:56,771]   global/l1/W:0 (3, 3, 1, 32)
[2018-11-07 00:04:56,771]   global/l1/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,772]   global/l2/W:0 (3, 3, 32, 32)
[2018-11-07 00:04:56,772]   global/l2/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,772]   global/l3/W:0 (3, 3, 32, 32)
[2018-11-07 00:04:56,772]   global/l3/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,772]   global/l4/W:0 (3, 3, 32, 32)
[2018-11-07 00:04:56,772]   global/l4/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,772]   global/RNN/BasicLSTMCell/Linear/Matrix:0 (544, 1024)
[2018-11-07 00:04:56,772]   global/RNN/BasicLSTMCell/Linear/Bias:0 (1024,)
[2018-11-07 00:04:56,772]   global/value/w:0 (256, 1)
[2018-11-07 00:04:56,772]   global/value/b:0 (1,)
[2018-11-07 00:04:56,772]   global/action/w:0 (256, 6)
[2018-11-07 00:04:56,773]   global/action/b:0 (6,)
[2018-11-07 00:04:56,773]   local/l1/W:0 (3, 3, 1, 32)
[2018-11-07 00:04:56,773]   local/l1/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,773]   local/l2/W:0 (3, 3, 32, 32)
[2018-11-07 00:04:56,773]   local/l2/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,773]   local/l3/W:0 (3, 3, 32, 32)
[2018-11-07 00:04:56,773]   local/l3/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,773]   local/l4/W:0 (3, 3, 32, 32)
[2018-11-07 00:04:56,773]   local/l4/b:0 (1, 1, 1, 32)
[2018-11-07 00:04:56,773]   local/RNN/BasicLSTMCell/Linear/Matrix:0 (544, 1024)
[2018-11-07 00:04:56,773]   local/RNN/BasicLSTMCell/Linear/Bias:0 (1024,)
[2018-11-07 00:04:56,774]   local/value/w:0 (256, 1)
[2018-11-07 00:04:56,774]   local/value/b:0 (1,)
[2018-11-07 00:04:56,774]   local/action/w:0 (256, 6)
[2018-11-07 00:04:56,774]   local/action/b:0 (6,)
[2018-11-07 00:04:56,774] Events directory: /tmp/Pong-v0/train_0
[2018-11-07 00:04:57,521] Starting session. If this hangs, we're mostly likely waiting to connect to the parameter server. One common cause is that the parameter server DNS name isn't resolving yet, or is misspecified.

What is the compatible python version?

What python version do we need to install?

A problem about the wrappers

Hello, I am very interested in this repo and I want to reproduct the result. However, when I run the train.py in child mode(I am not familiar with tmux), it report an error that

Traceback (most recent call last):
  File "worker.py", line 188, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "worker.py", line 180, in main
    run(args, server)
  File "worker.py", line 27, in run
    noLifeReward=args.noLifeReward)
  File "/home/shin/noreward-rl/src/envs.py", line 23, in create_env
    return create_mario(env_id, client_id, **kwargs)
  File "/home/shin/noreward-rl/src/envs.py", line 106, in create_mario
    env = modewrapper(acwrapper(env))
  File "/home/shin/noreward-rl/src/ppaquette_gym_super_mario/wrappers/action_space.py", line 31, in __init__
    self.action_space = gym.spaces.multi_discrete.DiscreteToMultiDiscrete(self.action_space, mapping)
AttributeError: module 'gym.spaces.multi_discrete' has no attribute 'DiscreteToMultiDiscrete'

It seems that the environment encounts some problems, have you met this error ? I can use gym.spaces.multi_discrete.DiscreteToMultiDiscrete in my other python script. Do you have some ideas about this issue ?

Mario Distance

When running the default mario
Once mario's distance become more than 40%
it always restart at 40%
How can i modify the code that it can always restart at the begining of the game(0%)

Convergence without LSTM

Do you expect the A3C agent with curiosity to converge on the Sparse map setting without an LSTM? Has anyone tried running this code without the LSTM?

models.tar.gz cannot be downloaded

Hi,

Upon running bash models/download_models.sh it returns an ERROR 404 : File not found. From what it looks like, it attempts to download the file from the link (https://people.eecs.berkeley.edu/~pathak/noreward-rl/resources/models.tar.gz) which no longer seems to exist.

Could you please update it with the new link? Thank you very much!

tensor flow not compatible and check sum issue in the requirement file

I am facing a lot of issues building the model. It's throwing errors with respect to tensor flow and checksum error in the requirement file. So wanted to know if any changes needs to be made

getting install error

getting install error running requierments.txt, any idea on how to resolve this error:

  Could not build doom-py: Command '['cmake', '-DCMAKE_BUILD_TYPE=Release', '-DBUILD_PYTHON=ON', '-DBUILD_JAVA=OFF', '-DPYTHON_EXECUTABLE:FILEPATH=/home/rjn/projects/RL/noreward-rl/curiosity/bin/python', '-DPYTHON_LIBRARY=/usr/lib/python2.7/config-x86_64-linux-gnu/libpython2.7.so', '-DPYTHON_INCLUDE_DIR=/usr/include/python2.7']' returned non-zero exit status 1. (HINT: are you sure cmake is installed? You might also be missing a library. Try running 'apt-get install -y python-numpy cmake zlib1g-dev libjpeg-dev libboost-all-dev gcc libsdl2-dev wget unzip'
  
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-install-aEKIPe/doom-py/setup.py", line 88, in <module>
      include_package_data=True,
    File "/home/rjn/projects/RL/noreward-rl/curiosity/local/lib/python2.7/site-packages/setuptools/__init__.py", line 129, in setup
      return distutils.core.setup(**attrs)
    File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
      dist.run_commands()
    File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
      self.run_command(cmd)
    File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
      cmd_obj.run()
    File "/home/rjn/projects/RL/noreward-rl/curiosity/local/lib/python2.7/site-packages/wheel/bdist_wheel.py", line 202, in run
      self.run_command('build')
    File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
      self.distribution.run_command(command)
    File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
      cmd_obj.run()
    File "/tmp/pip-install-aEKIPe/doom-py/setup.py", line 63, in run
      build_func()
    File "/tmp/pip-install-aEKIPe/doom-py/setup.py", line 41, in build_linux
      build_common('so')
    File "/tmp/pip-install-aEKIPe/doom-py/setup.py", line 28, in build_common
      subprocess.check_call(['cmake', '-DCMAKE_BUILD_TYPE=Release', '-DBUILD_PYTHON=ON', '-DBUILD_JAVA=OFF', '-DPYTHON_EXECUTABLE:FILEPATH={}'.format(sys.executable)] + cmake_arg_list, cwd='doom_py')
    File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '-DCMAKE_BUILD_TYPE=Release', '-DBUILD_PYTHON=ON', '-DBUILD_JAVA=OFF', '-DPYTHON_EXECUTABLE:FILEPATH=/home/rjn/projects/RL/noreward-rl/curiosity/bin/python', '-DPYTHON_LIBRARY=/usr/lib/python2.7/config-x86_64-linux-gnu/libpython2.7.so', '-DPYTHON_INCLUDE_DIR=/usr/include/python2.7']' returned non-zero exit status 1
  
  ----------------------------------------
  Failed building wheel for doom-py
  Running setup.py clean for doom-py
Failed to build doom-py
docker-py 1.10.3 has requirement requests<2.11,>=2.5.2, but you'll have requests 2.18.1 which is incompatible.

Model modification

In order to use it I need to download PRE-TRAINED models using your bash script. How do I edit model definition (layers, convolution stride, input and output shape, action space size)? How do I train the model from scratch?

P.S. I'm sorry for posting it here because you have no emails specified.

Looking forward to your reply.

where is ./models/download_models.sh ?

Hello, it's a great work. But where is ./models/download_models.sh ?

Generate maze code

Hi Pathak, would you share the maze generator code?

I am really interested in how to generate a maze in doom by myself.

Actually, I could build a wad file about maze with omgifol module. Unfortunately, I don't know how to set a goal in maze. So would you please do me a favour?

Waiting for your reply. Thank you very very much.

Could the algorithm be used on reinforced learning algorithms with experience reply?

Could the algorithm be used on DDPG? And in case action is continuious, how to calculate the loss function of the forward model?
Thanks.

Is this project also expected to work well on mujoco environment or not?

Hi,
I want to know whether this project is expected to work well on mujoco environment or not. This project is based on A3C which may not be suitable for mujoco?

# TODO: historical accident

Hi @pathak22 ,

first of all thanks for releasing the code!

I have been taking a look at it and I have a question concerning those few lines of code where you wrote:
# TODO: historical accident ...

Why are you multiplying the loss by 20 and 288 in those lines?
grads = tf.gradients(self.loss * 20.0, pi.var_list)
self.forwardloss = self.forwardloss * 288.0

I understand this is related to the batch size (or rollout steps) and to the number of features representing a state, but I can not really see the point of multiplying in such a way... could you please give me a hint?

Thanks,

Questions about training

I 'm not familiar with tmux. When I run ''python train.py --default --env-id doom'', nothing happens. When I check the tensorboard at http://localhost:12345/, it's empty. Anybody has any idea?

Mario train.py script fails to train

When running the default mario training script,

python train.py --default --env-id mario --noReward

all workers seem to hang after the following point (and eventually error after ~10 minutes due to an empty Queue):

Doing hard mario fceux reset (40 seconds wait) !
[2018-04-24 20:35:53,198] Starting training at gobal_step=0

fceux seems to restart over and over since the environment doesn't appear to step properly. At some point, I also received the error 'Closing episode (appears to be stuck). See documentation for how to handle this issue.' from ppaquette's super mario code https://github.com/ppaquette/gym-super-mario/blob/master/ppaquette_gym_super_mario/nes_env.py.

Any ideas? Thanks in advance!