Course Content

Introduction to Information Retrieval (3 hours)
- Basic Text Processing: Tokenization, Stopwords, Stemming, Lemmatization, Zipf’s and Heap’s law
- Spelling correction and Edit distances: Hamming distance, Longest common Subsequence, Levenstein edit distance
- Boolean Retrieval Model
Basic Ranking and Evaluation Measures (4 hours)
- Vector Space Model
- TF*IDF
- IR Evaluation: Precision, Recall, F-measures, Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG)
- designing test collection, relevance judgments
Probabilistic Retrieval Model
- Introduction: Generative Model
- Probabilistic Ranking Principle
- Binary Independence Model
- Okapi 25
- Bayesian Networks for IR
Statistical Language Model
- Basics of Language Model
- Query-likelihood Approach and different Smoothing Methods
- Advance Query Type: Query expansion,
- Relevance feedback, Novelty & Diversity
Topic Model
- Introduction to topic model
- Latent Semantic Indexing
- Probabilistic Latent Semantic Indexing
- Latent Dirichlet Allocation
- Topic model for IR
Link Analysis
- Introduction: World Wide Web as Graph
- PageRank
- HITS
- Topic-specific and Personalized PageRank
Indexing and Searching
- Different Compression Methods: Ziv-Lempel, Variable-Byte, Gamma, Golomb, Gap encoding
- Query Processing: TAAT, DAAT, WAND, Fagin’s algorithm
- Near Duplicate Detection: Shingling, Min-wise independent permutations, locality sensitive hashing
Retrieval using unsupervised techniques
- Retrieval using word-embeddings and clustering
Retrieval using Supervised ML (4 hours)
- Introduction to Learning to Rank for retrieval
- Retrieval using classification.
Advance topic : One or two contemporary topics which can change from semester to semester. For example , Fairness in raking (

Learning Outcomes

This course is designed to provide an in-depth understanding of how unstructured texts are processed, indexed, and queried to meet users’ information needs. It also discusses different methods for clustering and classifying documents to enhance the efficiency of the retrieval system.

Text Books

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008. ISBN-13: 978-0521865715 ebook
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, ISBN-13: 978-0262026512.

References

Jure Leskovec, Anand Rajaraman , Jeffrey D. Ullman. Mining of Massive Datasets, Cambridge University Press, 2011. ISBN: 978-1107077232. ebook
Larry Wasserman. All of Statistics, Springer, 2004. ISBN-13: 978-0387402727

Past Offerings

(Note: Past offerings could be under a different course number.)

Offered in Aug-Nov, 2025 by Koninika Pal
Offered in Aug-Dec, 2022 by Koninika Pal
Offered in Jul-Dec, 2021 by Mrinal, Koninika

Course Metadata

Item	Details
Course Title	Information Retrieval
Course Code	DS5603
Course Credits	3-0-0-3
Course Category	PME
Proposing Faculty	Koninika Pal & Mrinal Kanti Das
Approved on	Senate 19 of IIT Palakkad
Course prerequisites	Probability and Statistics, Programming
Course status	New
Course pre-revision code	CS5621