Data Science Applications and Techniques
This undergraduate module has been designed in tandem with the DSTA module for MSc Data Science students.
It presents Data Science as a set of 9 computational problems, then examines the geometrical interpretation of data and its consequences.
Finally, the "Rating and Rating" and “Complex network” models are studied in some depth.
This module has been designed for a minimal overlapping with the Concepts of Machine Learning/ Practical Machine learning modules, which are available as electives to undegrad students.
On successful completion of this module a student will be expected to be able to:
- understand Data Science as 9 Computational and modelling problems;
- deploy techniques for quantitative data analysis, such as spectral analysis, matrix decomposition and support vector machines;
- use Python to apply the techniques learned on the module;
- validate and evaluate data analysis results, and
- demonstrate satisfactory knowledge of network models.
- introduction to the module;
- Data Science as 9 computational problems;
- refresh some concepts of Statistics, Linear Algebra and Information theory as they come into play;
- the geometric view of data, the curse of dimensionality, spectral and decomposition techniques;
- traditional Data Mining techniques such as Dimensionality reduction, e.g.,PCA, SVD, SVMs, kernelization
- advanced techniques: Non-negative Matrix Factorization and Factorization Machines;
- Rating and Ranking, in the domain of sport predictions.
- lab experience with Python modules for Data Analytics such as NumPy, Pandas and Scikit-learn;
- from data to networks (graphs), and their relevant properties;
- network analysis in various domains: Biology, International trade, Web search and Finance, and centrality measures; communities.
Introduction to Data Analytics using R.
In general, the ability to program in Python, SQL and a basic knowledge of Statistics normally obtained by taking the relevant undergrad modules or as approved by the module leader.
Coursework assignments (20%) and a 2 hour exam (80%).
The needed study materials may be made available electronically during the term.
- F. Provost and T. Fawcett, Data Science for Business. O’Reilly, 2013 (2nd edition is expected).
- M. Zaki and W. Meira, Data Mining and Machine Learning: Fundamental Concepts and Algorithms (2nd ed.). CUP, 2020.
- A. N. Langville and C. D. Meyer, Who’s #1?: The Science of Rating and Ranking. Princeton University Press, 2012.
- G. Caldarelli and A. Chessa, Data Science and Complex Networks. Oxford University Press, 2016.
- J. Grus, Data Science from Scratch – First principles with Python (2nd ed.). O’Reilly, 2019.
- J. Voss, An Introduction to Statistical Computing: A Simulation-based Approach. Wiley, 2013.
- J. Vanderplas, A Whirlwind Tour of Python. O’Reilly, 2016.