Skip to content Search
Search our website:

Data Science Applications and Techniques

Short name: DSAT
SITS code: BUCI071H6
Credits: 15
Level: 6
Module leader: Alessandro Provetti
Lecturer(s): Alessandro Provetti
Online material: https://moodle.bbk.ac.uk/

Module outline

This undergraduate module has been designed in tandem with the DSTA module for MSc Data Science students.

It presents Data Science as a set of 9 computational problems, then examines the geometrical interpretation of data and its consequences.

Finally, the "Rating and Rating" and “Complex network” models are studied in some depth.

This module has been designed for a minimal overlapping with the Concepts of Machine Learning/ Practical Machine learning modules, which are available as electives to undegrad students.

Learning Outcomes

On successful completion of this module a student will be expected to be able to:

  • understand Data Science as 9 Computational and modelling problems;
  • deploy techniques for quantitative data analysis, such as spectral analysis, matrix decomposition and support vector machines;
  • use Python to apply the techniques learned on the module;
  • validate and evaluate data analysis results, and
  • demonstrate satisfactory knowledge of network models.

Syllabus

  • introduction to the module;
  • Data Science as 9 computational problems;
  • refresh some concepts of Statistics, Linear Algebra and Information theory as they come into play;
  • the geometric view of data, the curse of dimensionality, spectral and decomposition techniques;
  • traditional Data Mining techniques such as Dimensionality reduction, e.g.,PCA, SVD, SVMs, kernelization
  • advanced techniques: Non-negative Matrix Factorization and Factorization Machines;
  • Rating and Ranking, in the domain of sport predictions.
  • lab experience with Python modules for Data Analytics such as NumPy, Pandas and Scikit-learn;
  • from data to networks (graphs), and their relevant properties;
  • network analysis in various domains: Biology, International trade, Web search and Finance, and centrality measures; communities.

Prerequisites

Introduction to Data Analytics using R.

In general, the ability to program in Python, SQL and a basic knowledge of Statistics normally obtained by taking the relevant undergrad modules or as approved by the module leader.

Timetables

Indicative timetables can be found in the handbooks available on programme pages. Personalised teaching timetables for students are available via My Birkbeck.

Assessment

Coursework assignments (20%) and a 2 hour exam (80%).

Recommended reading

The needed study materials may be made available electronically during the term.

  • F. Provost and T. Fawcett, Data Science for Business. O’Reilly, 2013 (2nd edition is expected).
  • M. Zaki and W. Meira, Data Mining and Machine Learning: Fundamental Concepts and Algorithms (2nd ed.). CUP, 2020.
  • A. N. Langville and C. D. Meyer, Who’s #1?: The Science of Rating and Ranking. Princeton University Press, 2012.
  • G. Caldarelli and A. Chessa, Data Science and Complex Networks. Oxford University Press, 2016.
  • J. Grus, Data Science from Scratch – First principles with Python (2nd ed.). O’Reilly, 2019.
  • J. Voss, An Introduction to Statistical Computing: A Simulation-based Approach. Wiley, 2013.
  • J. Vanderplas, A Whirlwind Tour of Python. O’Reilly, 2016.