Students in this module will learn to understand the emerging area of cloud computing and how it relates to traditional models of computing, and gain competence in MapReduce as a programming model for distributed processing of big data.
This module aims to introduce back-end cloud computing techniques for processing "big data" (terabytes/petabytes) and developing scalable systems (with up to millions of users). We focus mostly on MapReduce, which is presently the most accessible and practical means of computing for "Web-scale" problems, but will discuss other techniques as well.
- Introduction to Cloud Computing
- Cloud Computing Technologies and Types
- Parallel Computing and Distributed Systems
- Big Data
- MapReduce and Hadoop
- Running Hadoop in the Cloud [Practical Lab Class]
- Developing MapReduce Programs
- Link Analysis in the Cloud
- Data Management in the Cloud
- Information Retrieval in the Cloud
- Beyond MapReduce (e.g., Apache Spark)
Good knowledge of object-oriented programming in Python would be necessary.
MSc students who did not have much experience in software development before joining their respective postgraduate programmes should have already taken the Principles of Programming I (POP1) module.
All dates and timetables are listed in the programme handbooks of individual programmes.
A couple of programming assignments.
Coursework (20%). Examination (80%).
- Jothy Rosenberg and Arthur Mateos, The Cloud at Your Service, Manning, 2010.
- Jimmy Lin and Chris Dyer, Data-Intensive Text Processing with MapReduce, Morgan and Claypool, 2010.
- Extensive use is made of other relevant book chapters and research papers that are distributed in class or provided online.