Course Objectives:

While traditional areas of computer science remain highly important, increasingly researchers of the future will be involved with using computers to understand and extract usable information from massive data arising in applications. The main objective of this course is to introduce students to the theoretical and mathematical foundations of data science. This course will be rigorous, and will explore the rich and fascinating math behind some of the popular techniques and intellectual ideas of modern day data science and machine learning.

Course Content

High-dimensional space: Law of large numbers, the geometry of high dimensions (6 lectures)

Best-Fit subspaces and SVD: Introduction, singular vectors, singular value decomposition (SVD), best k-rank approximations, left singular vectors, eigenvectors, applications of SVD (9 lectures)

Random walks and Markov Chains: Introduction, stationary distribution, Markov chain Monte Carlo, areas and volumes, convergence of random walks on undirected graphs, random walks on undirected graph with unit edge weights, random walk in Euclidean space (13 lectures)

Machine learning: Introduction, the perceptron algorithm, kernel functions, generalizing to new data, overfitting and uniform convergence, Occam’s razor, regularization, online learning, support-vector machines, VC-dimension, boosting, stochastic gradient descent, deep learning (14 lectures)

Learning Outcomes:

Upon successful completion of this course, the student will:

  1. have an understanding of basic mathematical concepts in data science, relating to linear algebra, probability, and calculus.
  2. be able to employ methods related to these concepts in a variety of data science applications.
  3. be able to adopt a rigorous and mathematical approach to solving problems in machine learning and data science.
  4. be able to apply the mathematical concepts discussed over the duration of the course.

Text Books:

  1. Avrim Blum, John Hopcroft and Ravindran Kannan, Foundations of Data Science, Cambridge University Press, February 29, 2020, ISBN-13: 978-1108485067

References:

Will be prescribed by the instructor on a topic-by-topic basis.

Past Offerings

  • Offered in Jan-May, 2024 by Deepak Rajendraprasad
  • Offered in Jan-May, 2023 by Deepak Rajendraprasad
  • Offered in Jan-May, 2022 by Deepak Rajendraprasad
  • Offered in Jan-May, 2021 by Deepak

Course Metadata

Item Details
Course Title Foundations of Data Science and Machine Learning
Course Code CS5014
Course Credits 3-0-0-3
Course Category PMT
Proposing Faculty Albert Sunny
Approved on Senate 11 of IIT Palakkad
Course prerequisites Probability & Linear Algebra
Course status NEW