Contents

  1. Lab on set up : amp; manipulating files in HDFS [2 (lecture), 9 (lab)]

  2. Basic programs of Hadoop MapReduce: Driver code, Mapper code, Reducer code, RecordReader, Combiner, Partitioner [4 (lecture), 9 (lab)]

  3. Pig : Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators [4 (lecture), 12(lab)]

  4. Big data analytics in Spark using PySpark: Installing Apache Spark, Spark Ecosystem, Resilient Distributed Dataset (RDD) in Spark, building machine learning model using PySpark [4 (lecture), 12(lab)]

Learning Outcomes

  • Preparing for data summarization, query, and analysis.
  • Applying data modelling techniques to large data sets
  • Creating applications for Big Data analytics
  • Building a complete business data analytic solution

Learning Objectives

The primary objective of this course is to optimize business decisions and create a competitive advantage with Big Data analytics. This course will introduce the basics required to develop map reduce programs, derive business benefit from unstructured data. This course will also give an overview of the architectural concepts of Hadoop and introducing map reduce paradigm. Another objective of this course is to introduce programming tools PIG & HIVE in Hadoop ecosystem.

Text Books

  1. Big Java 4th Edition, Cay Horstmann, Wiley John Wiley & Sons, INC, ISBN: 9780470509487
  2. Hadoop: The Definitive Guide by Tom White, 3 rd Edition, O’reilly, ISBN: 9781449328917

References

  1. Hadoop MapReduce Cookbook,Srinath Perera, Thilina Gunarathne, O’reilly, ISBN: 9781849517287
  2. Hadoop for Dummies by Dirk deRoos, Paul C.Zikopoulos, Roman B.Melnyk,Bruce Brown, Rafael Coss, John Wiley & Sons, 2014, ISBN: 1118607554
  3. Hadoop in Practice by Alex Holmes, MANNING Publication, ISBN: 9351197425

Past Offerings

(Note: Past offerings could be under a different course number.)
  • Offered in Jan-May, 2023 by Satyajit Das
  • Offered in Jan-May, 2022 by Satyajit Das
  • Offered in Jan-May, 2021 by Satyajit
  • Offered in Jul-Dec, 2020 by Satyajit

Course Metadata

Item Details
Course Title Big Data Lab
Course Code DS5102
Course Credits 1-0-3-3
Course Category PMP
Approved on Senate of IIT Palakkad
Course pre-revision code CS5104