Prerequisite: Familiarity with Algorithms, Probability, Linear Algebra, Programming
Course Content
-
Data Collection: Various sources and types of data: text, video, audio, biology etc (3 hours)
-
Data Preprocessing: Cleaning data, missing data imputation, noise elimination, feature selection and dimensionality reduction, normalization (6 hours)
-
Data Storage: Database, Schema, ER diagram, SQL, functions, stored procedures, indexing B+tree, MongoDB, Client-Server Architecture (9 hours)
-
Information Retrieval: index construction, scoring models, complete search engine mechanism, evaluation methods. (6 hours)
-
Data Processing: Data structures. Stack, Queue, Linked List, Associated memory, Graphs. Algorithms. Searching, Sorting, Graph traversal, Complexity (9 hours)
-
Data Analysis: regression, principal component analysis, canonical correlation analysis, analysis of variance (6 hours)
-
Data Visualization: table, graph, histogram, pie-chart, area-plot, box-plot, scatter-plot, bubble-plot, waffle charts, word clouds. (3 hours)
Learning Outcomes
To be able to state and analyse
- Preprocessing techniques for various datasets,
- Standard database systems concepts like tables, relations, query
- Information retrieval techniques such as indexing, scoring, ranking, evaluation
- Data processing algorithms and data structures
- Visualization techniques
Learning Objectives: To be able to learn about the entire pipeline of a typical system involving data, collection, preprocessing, storage, retrieval, processing, analysis, and visualization.
Text Books
- Introduction to Algorithms. Cormen, Leiserson, Rivest, Stein. MIT Press 3ed. ISBN-13: 978-0262533058
- Database System Concepts. Silberschatz, Korth, Sudarshan. McGraw Hill Education; Sixth edition.ISBN-13: 978-9332901384
- Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools. Cielen, Meysman,Ali. Dreamtech Press. ISBN-13: 978-9351199373
References
- Data Engineering: A Novel Approach to Data Design. Brian Shive. Technics Publications. ISBN-13: 978-1935504603
- Python Data Science Handbook: Essential Tools for Working with Data. Joel Grus. O’Reilly. ISBN-13: 978-9352134915
Past Offerings
(Note: Past offerings could be under a different course number.)- Offered in Jul-Dec, 2020 by Mrinal
Course Metadata
Item | Details |
---|---|
Course Title | Data Engineering |
Course Code | CS5015 |
Course Credits | 3-0-0-3 |
Course Category | PMT |
Approved on | Senate of IIT Palakkad |
Course pre-revision code | DS5003 |