Learning Objective

Reinforcement learning (RL) is a paradigm of learning via interactions with the environment. RL algorithms are at the frontier of current success of AI: AlphaGo, the computer program that beat humans is a RL algorithm. The objective is to provide a bottom up approach: starting from foundation in Markov decision processes (MDP), the course builds up to the state-of-the-art RL algorithms.

Learning Outcomes

The student should be able to a) model a control task in the framework of MDPs. b) Identify the model based from the model free methods. c) Identify stability/convergence and approximation properties of RL algorithms. d) Use deep learning methods to RL problems in practice.

Course Content:

Introduction: State of the art applications in Atari, Alpha Go, relation to other problems in artificial intelligence [1 Week]
Markov Decision Processes (model based): Formulation, Value Iteration (VI), Policy Iteration (PI), Linear Programming (LP) [2 Weeks]
Approximate Dynamic Programming (approximate model based): curse-of-dimensionality, representations, Approximate value iteration, approximate policy iteration, approximate linear program, approximation and convergence guarantees [2 Weeks]
Stochastic Approximation: Single and multi-timescale stochastic approximation, introduction to ordinary differential equation based convergence results. [1 Week]
Value function learning (approximate model-free): Temporal difference (TD learning, TD(0), TD(lambda), Q-learning, State-Action-Reward-State Algorithm (SARSA) , TD with function approximation, on/off-policy learning, gradient temporal difference learning [2 weeks]
Actor-Critic: Policy gradient, Natural Actor-Critic [2 Weeks]
Deep RL [2 Weeks]
Exploration vs Exploitation: Upper Confidence Bound (UCB), Upper Confidence Reinforcement Learning (UCRL) [2 Weeks]

Text books

Richard S. Sutton and Andrew G. Barto, Introduction to Reinforcement Learning, 2nd Edition, MIT Press. 2017. ISBN-13 978-0262039246.
Dimitri Bertsekas and John G. Tsitsiklis, Neuro Dynamic Programming, Athena Scientific. 1996. ISBN-13: 978-1886529106

References

V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Hindustan Book Agency, 2009. ISBN-13: 978-0521515924
Deep Learning. Ian Goodfellow and Yoshua Bengio and Aaron Courville. MIT Press. 2016.ISBN-13: 978-0262035613.

Past Offerings

Offered in Jan-May, 2021 by Chandra Shekar
Offered in July-Dec, 2019 by Chandra Shekar

Course Metadata

Item	Details
Course Title	Reinforcement Learning
Course Code	CS5008
Course Credits	3-0-0-3
Course Category	ERC
Proposing Faculty	Chandra Sekhar Lakshminarayanan
Approved on	Senate 7 of IIT Palakkad
Course prerequisites	Probability, Linear Algebra, Data Structures & Algorithms
Course status	New