Simple DQN model, seen around 10M frames (iterations)
python dqn_atari.py \
--env Enduro-v0 \
--gpu 0 \
--model convnet \
--train_policy epgreedy \
--std_img \
--optimizer adam \
--learning_rate 0.0001
Simply replace --model convnet
with --model dueling_convnet
in the above command. Also try out other network architectures in deeprl/networks.py
.
Following curves compare the dueling (yellow), double (green) and simple (blue) deep Q networks.
Episode length
Total Reward over 20 episodes
Loss
This work was done as a course assignment for the CMU Deep RL course, so thanks to the instructors for guidance and providing starter code. Also thanks to Achal for hyperparameter suggestions.
[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529โ533, 2015.
[3] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. 2016.
[4] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581, 2015.