Prerequisite: Familiarity with Algorithms, Probability, Linear Algebra, Programming
Course Content

Data Collection: Various sources and types of data: text, video, audio, biology etc (3 hours)

Data Preprocessing: Cleaning data, missing data imputation, noise elimination, feature selection and dimensionality reduction, normalization (6 hours)

Data Storage: Database, Schema, ER diagram, SQL, functions, stored procedures, indexing B+tree, MongoDB, ClientServer Architecture (9 hours)

Information Retrieval: index construction, scoring models, complete search engine mechanism, evaluation methods. (6 hours)

Data Processing: Data structures. Stack, Queue, Linked List, Associated memory, Graphs. Algorithms. Searching, Sorting, Graph traversal, Complexity (9 hours)

Data Analysis: regression, principal component analysis, canonical correlation analysis, analysis of variance (6 hours)

Data Visualization: table, graph, histogram, piechart, areaplot, boxplot, scatterplot, bubbleplot, waffle charts, word clouds. (3 hours)
Learning Outcomes
To be able to state and analyse
 Preprocessing techniques for various datasets,
 Standard database systems concepts like tables, relations, query
 Information retrieval techniques such as indexing, scoring, ranking, evaluation
 Data processing algorithms and data structures
 Visualization techniques
Learning Objectives: To be able to learn about the entire pipeline of a typical system involving data, collection, preprocessing, storage, retrieval, processing, analysis, and visualization.
Text Books
 Introduction to Algorithms. Cormen, Leiserson, Rivest, Stein. MIT Press 3ed. ISBN13: 9780262533058
 Database System Concepts. Silberschatz, Korth, Sudarshan. McGraw Hill Education; Sixth edition.ISBN13: 9789332901384
 Introducing Data Science: Big Data, Machine Learning, and More, Using Python Tools. Cielen, Meysman,Ali. Dreamtech Press. ISBN13: 9789351199373
References
 Data Engineering: A Novel Approach to Data Design. Brian Shive. Technics Publications. ISBN13: 9781935504603
 Python Data Science Handbook: Essential Tools for Working with Data. Joel Grus. O’Reilly. ISBN13: 9789352134915