Skip to content Search
Search our website:

Data Warehousing and Data Mining

Short name: DWDM
SITS code: COIY026H7
Credits: 15 credits
Level: 7
Module leader: Nigel Martin
Lecturer(s): Nigel Martin

Module outline

This module covers the organisation, analysis and mining of large data sets to support business intelligence applications. Students study the principles and commercial application of the technologies, as well as research results and emerging architectures underpinning the analysis and mining of "big data".

Aims

To study advanced aspects of data warehousing and data mining, encompassing the principles, research results and commercial application of the technologies.

Syllabus

  • Data warehousing requirements.
  • Data warehouse conceptual design.
  • Data warehouse architectures.
  • Data warehouse logical design: star schemas, snowflake schemas, fact tables, dimensions, measures.
  • OLAP architectures, OLAP operations. SQL extensions for OLAP.
  • Data warehouse physical design: partitioning, parallelism, compression, indexes, materialized views, column stores.
  • Data warehouse construction: data extraction, transformation, loading and refreshing. Warehouse metadata. Continuous ETL.
  • Data warehouse architecture trends. MapReduce and warehouse architectures: Pig, Hive, Spark.
  • Data mining concepts, tasks and algorithms.
  • Data mining technologies and implementations. Techniques for mining large data sets, stream mining, architecture trends, standards, products.
  • Research trends in data warehousing and data mining.

Prerequisites

A first module in Database Systems (e.g. as taught in a typical UK undergraduate degree in computer science)

Timetable

All dates and timetables are listed in the programme handbooks of individual programmes.

Coursework

Practical exercise involving programming and design aspects of a data warehouse.

Assessment

By 2-hour written examination and practical coursework. The final module mark will be the exam mark attained. Passing the practical coursework component will be compulsory in order to pass the module overall.

Recommended reading

  • R. Ramakrishnan, J. Gehrke, Database Management Systems (3rd ed.), McGraw Hill, 2003, ISBN 0-07-246563-8.
  • M. Golfarelli, S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies, McGraw Hill, 2009, ISBN 978-0-07-161039-1.
  • J. Celko, Joe Celko's Analytics and OLAP in SQL, Morgan Kaufmann, 2006, ISBN 978-0-12-369512-3.
  • J. Han, M. Kamber, J Pei, Data Mining Concepts and Techniques (3rd ed.), Morgan Kaufmann, 2011, ISBN 978-0-12-381479-1.
  • Research papers will be distributed to students; students will also be directed to Web resources on the subject.