**Code:**CS5008 |

**Category:**ERC |

**Credits:**3-0-0-3

Prerequisite course: Probability, Linear Algebra, Data Structures and Algorithms

# Learning Objective

Reinforcement learning (RL) is a paradigm of learning via interactions with the environment. RL algorithms are at the frontier of current success of AI: AlphaGo, the computer program that beat humans is a RL algorithm. The objective is to provide a bottom up approach: starting from foundation in Markov decision processes (MDP), the course builds up to the state-of-the-art RL algorithms.

# Learning Outcomes

The student should be able to a) model a control task in the framework of MDPs. b) Identify the model based from the model free methods. c) Identify stability/convergence and approximation properties of RL algorithms. d) Use deep learning methods to RL problems in practice.

# Course Content:

- Introduction: State of the art applications in Atari, Alpha Go, relation to other problems in artificial intelligence [1 Week]
- Markov Decision Processes (model based): Formulation, Value Iteration (VI), Policy Iteration (PI), Linear Programming (LP) [2 Weeks]
- Approximate Dynamic Programming (approximate model based): curse-of-dimensionality, representations, Approximate value iteration, approximate policy iteration, approximate linear program, approximation and convergence guarantees [2 Weeks]
- Stochastic Approximation: Single and multi-timescale stochastic approximation, introduction to ordinary differential equation based convergence results. [1 Week]
- Value function learning (approximate model-free): Temporal difference (TD learning, TD(0), TD(lambda), Q-learning, State-Action-Reward-State Algorithm (SARSA) , TD with function approximation, on/off-policy learning, gradient temporal difference learning [2 weeks]
- Actor-Critic: Policy gradient, Natural Actor-Critic [2 Weeks]
- Deep RL [2 Weeks]
- Exploration vs Exploitation: Upper Confidence Bound (UCB), Upper Confidence Reinforcement Learning (UCRL) [2 Weeks]

# Text books

- Richard S. Sutton and Andrew G. Barto, Introduction to Reinforcement Learning, 2nd Edition, MIT Press. 2017. ISBN-13 978-0262039246.
- Dimitri Bertsekas and John G. Tsitsiklis, Neuro Dynamic Programming, Athena Scientific. 1996. ISBN-13: 978-1886529106

# References

- V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Hindustan Book Agency, 2009. ISBN-13: 978-0521515924
- Deep Learning. Ian Goodfellow and Yoshua Bengio and Aaron Courville. MIT Press. 2016.ISBN-13: 978-0262035613.

# Past Offerings

- Offered in Jan-May, 2021 by Chandra Shekar
- Offered in July-Dec, 2019 by Chandra Shekar

# Course Metadata

Item | Details |
---|---|

Course Title | Reinforcement Learning |

Course Code | CS5008 |

Course Credits | 3-0-0-3 |

Course Category | ERC |

Proposing Faculty | Chandra Sekhar Lakshminarayanan |

Approved on | Senate 7 of IIT Palakkad |

Course prerequisites | Probability, Linear Algebra, Data Structures & Algorithms |

Course status | New |