Information Retrieval and Organisation

Tutors: Dell Zhang and Mark Levene
Time: Tuesday evenings 6pm - 9pm (Spring Term)
Room: UCL Engineering Front Suite 103
Code: COIY064H7
Document: Module Specification
Teaching Assistant: Cosmin Stamate


Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze,
Introduction to Information Retrieval,
Cambridge University Press, 2008.

Companion Website


Week Date Session I Session II
1 10/01/2017 Chapter 00
Chapter 01
Boolean Retrieval
[slides] [classwork-p] [classwork-s]
2 17/01/2017 Chapter 02
The Term Vocabulary and Postings Lists
Chapter 03
Dictionaries and Tolerant Retrieval
[slides] [classwork-p] [classwork-s]
3 24/01/2017 Chapter 19.6
Near-Duplicates and Shingling
[slides] [classwork-p] [classwork-s]
Chapter 04
Index Construction
4 31/01/2017 Chapter 05
Index Compression
[slides] [classwork-p] [classwork-s]
Chapter 06
Scoring, Term Weighting, and the Vector Space Model
[slides] [classwork-p] [classwork-s] [example]
5 07/02/2017 Chapter 07
Computing Scores in a Complete Search System
Chapter 08
Evaluation in Information Retrieval
[slides] [example]
6 14/02/2017 Chapter 09
Relevance Feedback and Query Expansion
Suffix Trees
[slides] [example]
7 21/02/2017 Chapter 10
XML Retrieval
A Brief Introduction to Probability and Statistics
[slides] [example]
-- 19/02/2017 Coursework Part 1 - Submission Deadline
8 28/02/2017 Chapter 11
Probabilistic Information Retrieval
Chapter 12
Language Models for Information Retrieval
[slides] [example]
9 07/03/2017 Chapter 13
Text Classification & Naive Bayes
[slides] [example]
Chapter 14
Vector Space Classification
[slides] [demo] [example]
10 14/03/2017 Chapter 15
Support Vector Machines & Machine Learning on Documents
Chapter 16
Flat Clustering
[slides] [demo] [example]
11 21/03/2017 Chapter 17
Hierarchical Clustering
[slides] [example]
Chapter 18
Matrix Decompositions & Latent Semantic Indexing
-- 26/03/2017 Coursework Part 2 - Submission Deadline
-- Tuesday
6pm - 9pm
Revision Lecture at ??
Past Exam Paper 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15


Coursework: 20%
Part 1
Normal deadline: Sun 19/02/2017 23:55
Cut-off deadline: Sun 05/03/2017 23:55
Part 2
Normal deadline: Sun 26/03/2017 23:55
Cut-off deadline: Sun 09/04/2017 23:55
Penalty for late submission (i.e., after the normal deadline): the coursework mark will be capped at the minimum pass mark (i.e., 50% for MSc students).
Please submit your solutions in electronic form, through the Moodle system.

Examination: 80%
Past exam papers can be found at Birkbeck eLibrary.


Students committed to excellence are welcome to contact me for final project ideas.

Python Programming

Python [BPSN Course]

Information Retrieval Software

Apache Lucene
Terrier IR Platform
The Lemur Project
Python Package - Whoosh


Forsyth David and Ponce Jean: An Introduction to Probability.

Peter Norvig: How to Write a Spelling Corrector.
Peter Norvig: Natural Language Corpus Data, in Beautiful Data: The Stories Behind Elegant Data Solutions.
Paul Graham: A Plan for Spam.
Paul Graham: Better Bayesian Filtering.
Robert M. Bell et al.: The Million Dollar Programming Prize, IEEE Spectrum, May 2009.

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2010. (Chapter 22 Natural Language Processing)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edition, Addison Wesley, 2010.
Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, international edition, Pearson Education, 2009.
Stefan Buttcher, Charles Clarke, and Gordon Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
David Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd edition, Springer, 2004.

Jeffrey Dean: Challenges in Building Large-Scale Information Retrieval Systems (WSDM-2009 Keynote Speech). [VideoLecture]
UC Berkeley Course SIMS141: Search Engines: Technology, Society, and Business [Guest Lecture Videos].

Michael McCandless, Erik Hatcher, and Otis Gospodnetic, Lucene in Action, 2nd edition, Manning, 2010.

Toby Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O'Reilly, 2007.
Matthew Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, O'Reilly, 2011.
Satnam Alag, Collective Intelligence in Action , Manning, 2008.
Haralambos Marmanis and Dmitry Babenko, Algorithms of the Intelligent Web , Manning, 2009.

Ron Zacharski, A Programmer's Guide to Data Mining, Free Online eBook.

Hans Rosling: The Joy of Stats [Video].

Related Courses

Stanford Course CS276/LING286: Information Retrieval and Web Mining
Stuttgart Course: Introduction to Information Retrieval

MSU Course CSE484: Information Retrieval
Cornell Course CS430/INFO430: Information Retrieval
UNT Course CSCE5200: Information Retrieval and Web Search
UIUC Course CS410: Introduction to Text Information Systems (Spring 2008)
UIUC Course CS598: Integrative Intelligent Information Systems (Spring 2008)
UMass Course CS646: Information Retrieval
UCSC Course ISM260: Information Retrieval
UTexas Course CS 371R: Information Retrieval and Web Search
UPenn Course CIS 430: Introduction to Human Language Technology
PSU Course IST 441: Information Retrieval and Search Engines
UNC Course INLS 490-154: Introduction to Information Retrieval System Design and Implementation (Fall 2008)
IIT Course CS429: Introduction to Information Retrieval
Columbia Course COMS 6998: Search Engine Technology

Colorado Course CSCI 7000-001:Introduction to Information Retrieval
JHU Course 605.744: Information Retrieval (Spring 2009)
UCL Course M052: Information Retrieval


My Blog - Research on Search