Prerequisite: Familiarity with Algorithms, Probability, Linear Algebra, Programming
Course Content

Data Collection: Various sources and types of data: text, video, audio, biology etc (3 hours)

Data Preprocessing: Cleaning data, missing data imputation, noise elimination, feature selection and dimensionality reduction, normalization (6 hours)

Data Storage: Database, Schema, ER diagram, SQL, functions, stored procedures, indexing B+tree, MongoDB, ClientServer Architecture (9 hours)

Information Retrieval: index construction, scoring models, complete search engine mechanism, evaluation methods. (6 hours)

Data Processing: Data structures. Stack, Queue, Linked List, Associated memory, Graphs. Algorithms. Searching, Sorting, Graph traversal, Complexity (9 hours)

Data Analysis: regression, principal component analysis, canonical correlation analysis, analysis of variance (6 hours)

Data Visualization: table, graph, histogram, piechart, areaplot, boxplot, scatterplot, bubbleplot, waffle charts, word clouds. (3 hours)
Learning Outcomes
To be able to state and analyse
 Preprocessing techniques for various datasets,
 Standard database systems concepts like tables, relations, query
 Information retrieval techniques such as indexing, scoring, ranking, evaluation
 Data processing algorithms and data structures
 Visualization techniques
Learning Objectives: To be able to learn about the entire pipeline of a typical system involving data, collection, preprocessing, storage, retrieval, processing, analysis, and visualization.
Text Books
 Introduction to Algorithms. Cormen, Leiserson, Rivest, Stein. MIT Press 3ed. ISBN13: 9780262533058
 Database System Concepts. Silberschatz, Korth, Sudarshan. McGraw Hill Education; Sixth edition.ISBN13: 9789332901384
 Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools. Cielen, Meysman,Ali. Dreamtech Press. ISBN13: 9789351199373
References
 Data Engineering: A Novel Approach to Data Design. Brian Shive. Technics Publications. ISBN13: 9781935504603
 Python Data Science Handbook: Essential Tools for Working with Data. Joel Grus. Oâ€™Reilly. ISBN13: 9789352134915
Past Offerings
(Note: Past offerings could be under a different course number.) Offered in JulDec, 2020 by Mrinal
Course Metadata
Item  Details 

Course Title  Data Engineering 
Course Code  CS5015 
Course Credits  3003 
Course Category  PMT 
Approved on  Senate of IIT Palakkad 
Course prerevision code  DS5003 