Actor-critic with experience replay (ACER) [1]. The agent also receives the previous action and reward [2]. Uses batch off-policy updates to improve stability.
Run with python main.py <options>
. To run asynchronous advantage actor-critic (A3C) [3] (but with a Q-value head), use the --on-policy
option.
[1] Sample Efficient Actor-Critic with Experience Replay
[2] Learning to Navigate in Complex Environments
[3] Asynchronous Methods for Deep Reinforcement Learning