Data Integration and Large-Scale Analysis WS2024/25
(VU, DAT.B32UF Data Integration and Large-Scale Analysis)

DIA is a 5 ECTS bachelor and master course, applicable to the bachelor programs computer science or software engineering and management, as well as the master catalog 'Data Science'. This course covers major data integration architectures, key techniques for data integration and cleaning, as well as methods for large-scale, i.e., distributed, data storage and analysis.


Lectures

In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures, which take place Friday's 3pm in HS-i5 or virtually on Webex.

A: Data Integration and Preparation

  • 01 Introduction and Overview [Oct 11, pdf]
  • 02 Data Warehousing, ETL, and SQL/OLAP [Oct 18, pdf, pptx]
  • 03 Message-oriented Middleware, EAI, and Replication [Oct 25, pdf, pptx]
  • 04 Schema Matching and Mapping [Nov 08, pdf, pptx]
  • 05 Entity Linking and Deduplication [Nov 15, pdf, pptx]
  • 06 Data Cleaning and Data Fusion [Nov 22, pdf, pptx]

B: Large-Scale Data Management and Analysis

  • Lectures will be delivered by Dr. Lucas Iacono


Exercises

The lectures are accompanied by mandatory programming exercises (to the extend of 2 ECTS, i.e, roughly 50 working hours), preferably in Python language.

TBA


Organization

  • Lecturer: Dr. Shafaq Siddiqi, ISDS
  • Lecturer: Dr. Lucas Iacono, ISDS, Know Center
  • Final written exams: 07 Feb 2025, 15:00 - 17:00 in HS i12
  • Grading: 30% project, 70% final exam