This repo has implementations of reinforcement learning (RL) algorithms in Keras to learn to play Atari Pong. This is what I wrote in the process of learning about these RL algorithms, so it's not well-tuned or performant. If you're looking for a reference implementation to compare against, check out OpenAI baselines.
pong-ppo.py
- PPO. Defeats the "computer" opponent after 300 episodes of training.pong-pg.py
- Policy gradient (REINFORCE algorithm). Defeats the "computer" opponent after 400 episodes of training.pong-actor-critic.py
- On-policy batch actor-critic. Defeats the "computer" opponent after 300 episodes of training.pong-ddqn-batch.py
- Off-policy double Q learning. Defeats the "computer" opponent after 2000 episodes of training.pong-ddqn-per.py
- Off-policy double Q learning with prioritized experience replay. Defeats the "computer" opponent after 1000 episodes of training.
- Python 3.6
- NumPy
- Tensorflow 1.14
- OpenAI gym