Prerequisite course: Probability, Linear Algebra, Data Structures and Algorithms

# Learning Objective

Reinforcement learning (RL) is a paradigm of learning via interactions with the environment. RL algorithms are at the frontier of current success of AI: AlphaGo, the computer program that beat humans is a RL algorithm. The objective is to provide a bottom up approach: starting from foundation in Markov decision processes (MDP), the course builds up to the state-of-the-art RL algorithms.

# Learning Outcomes

The student should be able to a) model a control task in the framework of MDPs. b) Identify the model based from the model free methods. c) Identify stability/convergence and approximation properties of RL algorithms. d) Use deep learning methods to RL problems in practice.

# Course Content:

1. Introduction: State of the art applications in Atari, Alpha Go, relation to other problems in artificial intelligence [1 Week]
2. Markov Decision Processes (model based): Formulation, Value Iteration (VI), Policy Iteration (PI), Linear Programming (LP) [2 Weeks]
3. Approximate Dynamic Programming (approximate model based): curse-of-dimensionality, representations, Approximate value iteration, approximate policy iteration, approximate linear program, approximation and convergence guarantees [2 Weeks]
4. Stochastic Approximation: Single and multi-timescale stochastic approximation, introduction to ordinary differential equation based convergence results. [1 Week]
5. Value function learning (approximate model-free): Temporal difference (TD learning, TD(0), TD(lambda), Q-learning, State-Action-Reward-State Algorithm (SARSA) , TD with function approximation, on/off-policy learning, gradient temporal difference learning [2 weeks]
6. Actor-Critic: Policy gradient, Natural Actor-Critic [2 Weeks]
7. Deep RL [2 Weeks]
8. Exploration vs Exploitation: Upper Confidence Bound (UCB), Upper Confidence Reinforcement Learning (UCRL) [2 Weeks]

# Text books

1. Richard S. Sutton and Andrew G. Barto, Introduction to Reinforcement Learning, 2nd Edition, MIT Press. 2017. ISBN-13 978-0262039246.
2. Dimitri Bertsekas and John G. Tsitsiklis, Neuro Dynamic Programming, Athena Scientific. 1996. ISBN-13: 978-1886529106
1. V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Hindustan Book Agency, 2009. ISBN-13: 978-0521515924
2. Deep Learning. Ian Goodfellow and Yoshua Bengio and Aaron Courville. MIT Press. 2016.ISBN-13: 978-0262035613.

# Past Offerings

• Offered in Jan-May, 2021 by Chandra Shekar
• Offered in July-Dec, 2019 by Chandra Shekar