Prerequisite: Probability and Statistics, Programming

Course Content

  1. Introduction to Information Retrieval (3 hours)
    • Basic Text Processing: Tokenization, Stopwords, Stemming, Lemmatization, Zipf’s and Heap’s law
    • Spelling correction and Edit distances: Hamming distance, Longest common Subsequence, Levenstein edit distance
    • Boolean Retrieval Model
  2. Basic Ranking and Evaluation Measures (4 hours)
    • Vector Space Model
    • TF*IDF
    • IR Evaluation: Precision, Recall, F-measures, Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG)
    • designing test collection, relevance judgments
  3. Probabilistic Retrieval Model
    • Introduction: Generative Model
    • Probabilistic Ranking Principle
    • Binary Independence Model
    • Okapi 25
    • Bayesian Networks for IR
  4. Statistical Language Model
    • Basics of Language Model
    • Query-likelihood Approach and different Smoothing Methods
    • Advance Query Type: Query expansion,
    • Relevance feedback, Novelty & Diversity
  5. Topic Model
    • Introduction to topic model
    • Latent Semantic Indexing
    • Probabilistic Latent Semantic Indexing
    • Latent Dirichlet Allocation
    • Topic model for IR
  6. Link Analysis
    • Introduction: World Wide Web as Graph
    • PageRank
    • HITS
    • Topic-specific and Personalized PageRank
  7. Indexing and Searching
    • Different Compression Methods: Ziv-Lempel, Variable-Byte, Gamma, Golomb, Gap encoding
    • Query Processing: TAAT, DAAT, WAND, Fagin’s algorithm
    • Near Duplicate Detection: Shingling, Min-wise independent permutations, locality sensitive hashing
  8. Retrieval using unsupervised techniques
    • Retrieval using word-embeddings and clustering
  9. Retrieval using Supervised ML (4 hours)
    • Introduction to Learning to Rank for retrieval
    • Retrieval using classification.
  10. Advance topic : One or two contemporary topics which can change from semester to semester. For example , Fairness in raking (

Learning Outcomes

This course is designed to provide an in-depth understanding of how unstructured texts are processed, indexed, and queried to meet users’ information needs. It also discusses different methods for clustering and classifying documents to enhance the efficiency of the retrieval system.

Text Books

  1. Christopher D. Manning, Prabhakar Raghavan, Hinrich SchĂĽtze. Introduction to Information Retrieval, Cambridge University Press, 2008. ISBN-13: 978-0521865715 ebook
  2. Stefan BĂĽttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, ISBN-13: 978-0262026512.

References

  1. Jure Leskovec, Anand Rajaraman , Jeffrey D. Ullman. Mining of Massive Datasets, Cambridge University Press, 2011. ISBN: 978-1107077232. ebook
  2. Larry Wasserman. All of Statistics, Springer, 2004. ISBN-13: 978-0387402727

Past Offerings

(Note: Past offerings could be under a different course number.)
  • Offered in Aug-Dec, 2022 by Koninika Pal
  • Offered in Jul-Dec, 2021 by Mrinal, Koninika

Course Metadata

Item Details
Course Title Information Retrieval
Course Code DS5603
Course Credits 3-0-0-3
Course Category PME
Proposing Faculty Koninika Pal & Mrinal Kanti Das
Approved on Senate 16 of IIT Palakkad
Course prerequisites Probability and Statistics, Programming
Course status New
Course pre-revision code CS5621