Skip to content Search
Search our website:

Introduction to Data Analytics using R

Short name: IDAR
SITS code: BUCI045H6
Credits: 15
Level: 6
Module leader: Cen Wan
Lecturer(s): Cen Wan
Online material: https://moodle.bbk.ac.uk/

Module outline

This module covers the principle concepts and techniques of data analytics and how to apply them to large-scale data sets. Students develop the core skills and expertise needed by data scientists, including the use of techniques such as linear regression, classification and clustering. The module will show you how to use the popular and powerful data analysis language and environment R to solve practical problems based on use cases extracted from real domains.

Aims

To study advanced aspects of big data analytics, applying appropriate machine learning techniques to analyse big data sets, assessing the statistical significance of data mining results, and using the open-source tool R to perform basic data mining tasks on big data.

Syllabus

  • Introduction to big data analytics: big data overview, data pre-processing, concepts of supervised and unsupervised learning.
  • Basic statistics: mean, median, standard deviation, variance, correlation, covariance.
  • Linear regression: simple linear regression, introduction to multiple linear regression.
  • Classification: logistic regression, decision trees, SVM.
  • Ensemble methods: bagging, random forests, boosting.
  • Clustering: K-means, K-medoids, Hierarchical clustering, X-means.
  • Evaluation and validation: cross-validation, assessing the statistical significance of data mining results.
  • Selection of advanced topics such as: scalable machine learning, big data related techniques, mining stream data, social networks.
  • Tools: R.

Prerequisites

Experience with a modern programming language.

Timetables, locations and term dates

Programme specific timetables are listed in the handbooks available for each programme. Please consult the relevant programme page.

Enrolled students can find their personal teaching timetable and the location of classes on their My Birkbeck profile.

Coursework

Several pieces practical exercises involving learning and mining big data sets using the tool R.

Assessment

Coursework (20%). Examination (80%).

Recommended reading

  • An Introduction to Statistical Learning: With Applications in R: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.