This is a side project to learn more about deep reinforcement learning, by training an end-to-end architecture for playing Atari games from pixel input (not necessarily at superhuman level). We think that it's true what Feynman said: what you cannot build, you do not understand. Moreover, we think the best way of learning how to build things that work is, well, to try.
We're hoping to learn about many aspects of deep RL, including the theory underpinning policy gradient algorithms, the practical challenges in optimising the code for parallelisation on cloud GPUs, and how to structure an efficient pipeline for taking ML projects from conceptualisation through multiple training and testing iterations.
The project is written in Python using Tensorflow and OpenAI Gym, and primarily trained on Microsoft Azure.
Work in progress, expecting to update both the code and this description soon.
Vladimir Mikulik, Jacob Lagerros, Joar Skalse