Programming with Data
The aim of the module is to build on a first module on programming in Python with material which is relevant to students studying Data Science. As such, the module pursues a style of software development (in Python) that is centred on data manipulation, focussing on
- the Python modules that are most commonly in use for Data Science
- the Relational Data Model and SQL for data manipulation and data querying, and
- the ethical issues related to Data Science that are required for accreditation by the British Computer Society (previously covered in the Information Systems module which is no longer compulsory).
On successful completion of this module a student will be expected to be able to:
- demonstrate satisfactory knowledge of Python modules that are specific to Data Science;
- deal with different data formats and data sources (e.g., spatial data, time series data);
- understand the organization of data in the relational data model, and show fluency in writing SQL queries;
- understand and apply basic Python/SQL techniques for quantitative data analysis/visualisation.
- Python modules for Data Manipulation/Data Science;
- Basic Python data visualization;
- The relational data model;
- Querying and retrieving data using SQL;
- basics on semi-structured information (XML/HTML),
- Web data extraction, and
- Ethics issues in Data Science.
Various topics will be demonstrated by practical lab sessions.
Guest lecturers from industry may present parts of certain topics.
This will include a set of 2 short coursework assignments related to the data analytics methods presented in class.
Coursework assignments (20%) and a 2 hour exam (80%).
The syllabus will draw from chapters of the following three books:
- Joel Grus, “Data Science from Scratch – First principles with Python.” O’Reilly, 2015.
- Ryan Mitchell, “Web Scraping with Python.” O’Reilly, 2015.
- Stephen Marsland, “Machine Learning – An Algorithmic Perspective (2ed).” CRC Press, 2015.