**Code:**CS5654 |

**Category:**PME |

**Credits:**3-0-0-3

**Pre-requisites**: Algorithms, Probability.

(CS2030 Data Structures and Algorithms OR CS3070/CS2040 Design and Analysis of Algorithms OR CS5009 Algorithms OR CS4999) AND (MA2041 - Probability and Statistics OR MA5007 - Probability and Statistics OR MA4999 OR CS4999 OR PH4999)

XX4999 stands for having completed a bachelorâ€™s degree in XX program.

## Course Overview

Classical Algorithm Design assumes that all the required data is stored on a fast memory and any piece of it can be accessed in unit time. This assumption is no longer true when one has to deal with massive amounts of data - much more than what could be stored in the main memory of any computer. This course describes models and algorithmic techniques developed for handling such massive amounts of data that is typically made available with access limitations. Topics covered include data stream algorithms, sampling techniques, sketching techniques, sparsification and lower bounds. The emphasis will be in the design of these techniques, proving performance and efficiency guarantees and understanding the fundamental limitations of these approaches.

## Course Content

**Introduction**. Introduction to massive data problems. Introduction to
techniques like randomisation, streaming, sampling, and sketching. [2
lectures ]

**Streaming Algorithms**. The Data Stream Model. Finding Frequent Items,
Frequency Estimation. Estimating the Number of Distinct Elements: The
Tidemark Algorithm, The Median Trick, The BJKST Algorithm, Hyperloglog
Algorithm, CVM algorithm. Approximate Counting. [12 lectures]

**Sketches and Linear Sketches**. Finding Frequent Items via Linear
Sketching. CountSketch and Count-Min Sketch Algorithms. Comparison of
Frequency Estimation Methods. Estimating Frequency Moments: The AMS
Estimator, The Tug-of-War Sketch. [10 lectures]

**Sampling, Weight-Based Sampling**: The $\ell_0$-Sampling, $\ell_2$-
Sampling. [2 lectures]

**Graph Streams**. Basic Algorithms: The Connectedness Problem, The
Bipartiteness Problem, Shortest Paths and Distance Estimation via
Spanner. Finding Maximum Matchings: Maximum Cardinality Matching,
Maximum Weight Matching. [6 lectures]

**Graph Sketching**. Testing Connectivity Using Boundary Edges, Testing
Bipartiteness, The AGM Sketch. Counting Triangles: A Sampling-Based
Algorithm, A Sketch-Based Algorithm. [4 lectures]

**Lower Bounds**. Communication Complexity and Lower Bounds.
Communication Games, Protocols, Complexity. Two-Player Communication
Games. Data Streaming Lower Bounds: for Majority, for Frequency
Estimation. [6 lectures]

## Text Books

- Amit Chakrabarti.
*Data Stream Algorithms*Lecture Notes. (2024). (Available online)

## References

- Avrim Blum, John Hopcroft, and Ravindran Kannan.
*Foundations of data science*. Cambridge University Press, 2020. (ISBN: 978-1108485067. Also available online) - Jelani Nelson,
*Sketching Algorithms,*Lecture Notes (2020) (Available online)

# Past Offerings

# Course Metadata

Item | Details |
---|---|

Course Title | Algorithms for Big Data |

Course Code | CS5654 |

Course Credits | 3-0-0-3 |

Course Category | PME |

Proposing Faculty | Deepak Rajendraprasad and Krithika Ramaswamy |

Approved on | Senate of IIT Palakkad |

Course status | New |