Reinforcement Learning Specialization offered by University of Alberta and Alberta Machine Inteligence Institute
The main goals of the course are:
- Understand Exploration-Exploitation tradeoff using the multi-armed bandits
- Understand the structure and components of a (finite) Markov Decision Process
- Understand the definition of the state-value function and the action-value function
- Be able to explain how to derive the Bellman equations and the Bellman optimality equations
- Understand the framework of Dynamic Programming(Policy evaluation, policy iteration and generalized policy iteration)
The main goals of the course are:
- Understand prediction problems using Monte Carlo methods
- Understand how Temporal Difference learning works in prediction problems compared to the Monte Carlo method
- Understand different TD learning methods for control problems; Q-learning, SARSA, and Expected SARSA
- Understand the Dyna architecture (Dyna-Q and Dyna-Q+)
The main goals of the course are:
- Understand how value functions are approximated using parametrized functions
- Understand what coarse coding for feature generalization is and how to use neural network for function approximation
- Be able to implement Episodic SARSA
- Understand what the average reward
- Understand how Policy gradient and the actor-critic method work for continuing tasks
The final course aims to implement a complete RL system by
- Interpretting the setting into a RL framework and identifying a proper method
- Coding our environment
- Coding our agent
Our setting is that we want a lunar lander to land on the surface of the moon without crushing.