Students in this module will learn to understand the emerging area of cloud computing and how it relates to traditional models of computing, and gain competence in MapReduce as a programming model for distributed processing of big data.
This module aims to introduce back-end cloud computing techniques for processing "big data" (terabytes/petabytes) and developing scalable systems (with up to millions of users). We focus mostly on MapReduce, which is presently the most accessible and practical means of computing for "Web-scale" problems, but will discuss other techniques as well.
- Introduction to Cloud Computing
- Cloud Computing Technologies and Types
- Parallel Computing and Distributed Systems
- Big Data
- MapReduce and Hadoop
- Running Hadoop in the Cloud [Practical Lab Class]
- Developing MapReduce Programs
- Link Analysis in the Cloud
- Data Management in the Cloud
- Information Retrieval in the Cloud
- Beyond MapReduce (e.g., Apache Spark)
Good knowledge of object-oriented programming in Python would be necessary.
All dates and timetables are listed in the programme handbooks of individual programmes.
A couple of programming assignments.
Coursework (20%). Examination (80%).
- Jothy Rosenberg and Arthur Mateos, The Cloud at Your Service, Manning, 2010.
- Jimmy Lin and Chris Dyer, Data-Intensive Text Processing with MapReduce, Morgan and Claypool, 2010.
- Extensive use is made of other relevant book chapters and research papers that are distributed in class or provided online.