Data Science: Techniques and Applications

Click here for the 2022-23 edition

Time: Tuesday, 18:00 - 21:00 during weeks 2-11 of Summer Term.

Place: HyFlex
Labs MAL 404/405, and
MS Teams via Moodle (paywall): DSTA.

Module Coordinator: Alessandro Provetti

Teaching Assistants: Abul Hasan, Paschalis Lagias and Alberto Matuozzo.

Contents, resources and study materials:
the calendar below is shown as a general overview of the module.
Presentations, their order and the study materials are constantly reviewed, updated and amended.
The study materials may become final only at the end of the module. For a preview of the study programme, please see the shaded part below.


How to read the programme table
White background for regular lectures with slides and notes-taking.
Light-blue background for online lab experiences.
Grey background for work in progress or extra reference material (not examined).
Gold background for in-class assessments.
Date Unit Where Presentation (by revealjs) Resources PDF (by decktape or by revealjs)
Apr. 16 Week 0 (no class)
Apr. 23 Week 1 (no class)
April 30 2.a Class Class presentation Markdown PDF
2.b Class Data Science as 9 problems Markdown
From Provost-Fawcett's textbook:
  1. PF-ch. 2: Excerpts
PDF
2.c Class Math Concepts for Data Science Markdown
From Goodfellow et al. textbook:
  1. GBC-Ch. 2: Excerpts
PDF
NEW Lab Relevant Python modules:
  1. Numpy
  2. Pandas
Quarto Markdown for
  1. Numpy
  2. Pandas
Jupyter notebook for
  1. Numpy
  2. Pandas
PDF:
  1. Numpy
  2. Pandas
May 7 3.a Class Spectral Methods Markdown PDF
3.b Class Information Entropy Markdown for
  1. lecture
  2. pen-and-paper exercise
  3. for reference only, an advanced lecture on divergence
PDF
3.c Class Classification: The Iris Dataset Markdown
For reference: Excerpts from Zaki-Meira textbook.
PDF
3.d Lab 2D visualisation Markdown
  1. Download a Seaborn notebook
PDF
May 14 4.a Class Eigenpairs Markdown
From Leskovec et al. textbook (MMDS):
  1. MMDS-Ch. 11 Excerpts, part A
PDF
4.b Class The Gini index Markdown PDF
4.c Class Decision trees Markdown
  1. FP-ch.3: Predictive Modelling
PDF
4.d Lab Introduction: the k-NN algorithm
Classification with Scikit-learn
  1. baseline notebook
    Click here to see it on Colab
    A solution notebook is also available from the repo.
  2. k-NN Markdown
k-NN PDF
The lab presentation is in remarkjs format
Extra Non-binary classification

Evaluating Classification Performance
May 21 5.a Class High-dimensional data Markdown PDF
5.b Text as data Markdown PDF
5.c Lab Live coding lab: implementing Decision trees Markdown
This lab experience will be conducted on Colab:
First notebook (baseline)
Second notebook.
(create PDF directly from the browser; see
Remarkjs for details)
5.d Lab Computing Eigenvalues and Eigenvectors Markdown PDF
May 28 New! Online In-class quiz
6.a Class Singular-Value Decomposition Markdown PDF
6.b Natural Language Processing and Entropy measures Markdown PDF
6.c Class Introduction to Network models: Food Webs Markdown
From Caldarelli-Chessa textbook (CC):
  1. CC-Ch.1: Food webs Excerpts
PDF
Jun 4 7.a Class Discoverying latent dimensions
  1. MMDS Ch. 11 Excerpts, part B
  2. Code of the SVD example from the textbook
  3. video presentations are available from the textbook website
PDF
7.b Class Rating and ranking: Massey's ranking Markdown
From Langville-Meyer's textbook (LM):
  1. LM-ch.2: Massey's method
PDF
7.c Class Trade Networks Markdown

From Caldarelli-Chessa textbook (CC):
  1. Ch.2: Trade Networks Excerpts
PDF
7.d Lab The Food Web notebook
  1. An exercise notebook;
  2. its worked out solution, and
  3. data.
(create PDF directly from the browser; see
Remarkjs for details)
Jun 11 8.a Class Non-negative Matrix Factorization Markdown

The codes below are also available in a repl.
  1. A Scikit-learn use example
  2. A direct implementation,
  3. tutorial on the direct implementation above, a slightly extended version is here.

For reference:
the Nature article;
the NIPS article, and
an IEEE Computer review article which explains applications in recommender systems.
PDF
8.b Class Rating, ranking: Keener Markdown
  1. LM-ch.4: Keener's method
PDF
8.c Class The Internet network Markdown
  1. CC-Ch. 3: The Internet Excerpts
PDF
8.d Lab The Trade networks notebook.
  1. A local image of the trade networks notebook with questions.
  2. The complete notebook.
June 18 9.a Factorization Machines Markdown
For reference:
  1. The [Rendle, ICDM 2010] article.
  2. The [Rendle, TIST 2012] article.
PDF
9.b Lab Computing ratings and rankings: The Premier league

  1. Local image of the exercise notebook
  2. A solution notebook
  3. data
(create PDF directly from the browser; see
PDF)
9.c Class Self-organised networks: WWW, Wikipedia etc. Markdown
  1. CC-Ch. 4: WWW, Wikipedia etc. Excerpts
PDF
9.d Lab The Internet notebook
  1. The Internet notebook exercise.
  2. the solution notebook.
The WWW, Wikipedia and OSNs notebook
  1. The WWW, Wikipedia exercise notebook.
  2. and it complete solutions.
Jun 25 10.a Class Rating, ranking: Markov Chains Markdown
  1. LM-ch.6: Markov's method
PDF
10.b Lab Matrix factorisation and Recommender Systems

Local image exercise notebook
  1. Local image of the data.
(create PDF directly from the browser)
10.c Class Financial Networks Markdown
  1. CC-Ch. 5: Financial Networks excerpt
PDF
10.d Lab The Financial networks notebook
  1. The local exercise notebook.
  2. The local data for the Financial notebook. The notebook requires the Yahoofinancials module.
Jul 2 New! Online Final in-class test
Final in-class test
New Free discussion

Presentations here have been produced using Revealjs (v. 5) or Remark.
To print Revealjs presentations or to save them locally as PDF files please follow their instructions or install and run decktape on your computer.
Mathematical formulae are rendered online by MathJax. Hence, some security settings of your browser might need tuning.

A note on learning support from the department.

Powered by Reveal.js Powered by MathJax Powered by Remark