Information Retrieval and Organisation

Tutors: Dell Zhang
Time: Tuesday evenings 6pm - 9pm (Spring Term)
Room: Woburn House (WBH) Boardroom
Code: COIY064H7
Document: Module Spec
Teaching Assistant: Cosmin Stamate


Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze.
Introduction to Information Retrieval.
Cambridge University Press, 2008.

Companion Website
Dan Jurafsky and James H. Martin.
Speech and Language Processing, 2nd edition,
Pearson, 2008.

Companion Website (3rd edition draft)


Week Date Session I Session II
1 10/01/2017 [IIR-00] [SLP-01]
Boolean Retrieval
[slides] [classwork-p] [classwork-s]
2 17/01/2017 [IIR-02]
The Term Vocabulary and Postings Lists
Regular Expressions and Text Normalization
3 24/01/2017 [IIR-03]
Dictionaries and Tolerant Retrieval
[slides] [classwork-p] [classwork-s]
Edit Distance
4 31/01/2017 [IIR-05]
Index Compression
[slides] [classwork-p] [classwork-s]
Scoring, Term Weighting, and the Vector Space Model
[slides] [classwork-p] [classwork-s] [example]
5 07/02/2017 [IIR-08]
Evaluation in Information Retrieval
[slides] [example]
Probabilistic Information Retrieval
[slides] [example]
6 14/02/2017 [IIR-12]
Language Models for Information Retrieval
[slides] [example]
Language Modeling with N-Grams
-- 19/02/2017 Coursework Part 1 - Submission Deadline
7 21/02/2017 [SLP-05]
Spelling Correction and the Noisy Channel
Text Classification & Naive Bayes
[slides] [example]
8 28/02/2017 [SLP-06]
Naive Bayes
Sentiment Classification
9 07/03/2017 [IIR-14]
Vector Space Classification
[slides] [demo] [example]
[IIR-16] [IIR-07.1.6]
Flat Clustering
[slides] [demo] [example]
10 14/03/2017 [IIR-17]
Hierarchical Clustering
[slides] [example]
Vector Semantics
11 21/03/2017 [SLP-16]
Semantics with Dense Vectors
Matrix Decompositions & Latent Semantic Indexing
-- 26/03/2017 Coursework Part 2 - Submission Deadline
-- Tuesday
6pm - 9pm
Revision Lecture at ??
Past Exam Paper 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15
-- ---------- [IIR-04]
Index Construction
Computing Scores in a Complete Search System
-- ---------- [IIR-09]
Relevance Feedback and Query Expansion
XML Retrieval
-- ---------- [IIR-15]
Support Vector Machines & Machine Learning on Documents
Near-Duplicates and Shingling
[slides] [classwork-p] [classwork-s]
-- ---------- [Gusfield1997]
Suffix Trees
[slides] [example]
Probabilistic Topic Models


Coursework: 20%
Part 1
Normal deadline: Sun 19/02/2017 23:55
Cut-off deadline: Sun 05/03/2017 23:55
Part 2
Normal deadline: Sun 26/03/2017 23:55
Cut-off deadline: Sun 09/04/2017 23:55
Penalty for late submission (i.e., after the normal deadline): the coursework mark will be capped at the minimum pass mark (i.e., 50% for MSc students).
Please submit your solutions in electronic form, through the Moodle system.

Examination: 80%
Past exam papers can be found at Birkbeck eLibrary.


Students committed to excellence are welcome to contact me for final project ideas.

Python Programming

Python [A Short Course for BGRS and BPSN]

Information Retrieval Software

Apache Lucene
Terrier IR Platform
The Lemur Project
Python Package - Whoosh


Forsyth David and Ponce Jean: An Introduction to Probability.

Peter Norvig: How to Write a Spelling Corrector.
Peter Norvig: Natural Language Corpus Data, in Beautiful Data: The Stories Behind Elegant Data Solutions.
Paul Graham: A Plan for Spam.
Paul Graham: Better Bayesian Filtering.
Robert M. Bell et al.: The Million Dollar Programming Prize, IEEE Spectrum, May 2009.

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2010. (Chapter 22 Natural Language Processing)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edition, Addison Wesley, 2010.
Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, international edition, Pearson Education, 2009.
Stefan Buttcher, Charles Clarke, and Gordon Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
David Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd edition, Springer, 2004.

Jeffrey Dean: Challenges in Building Large-Scale Information Retrieval Systems (WSDM-2009 Keynote Speech). [VideoLecture]
UC Berkeley Course SIMS141: Search Engines: Technology, Society, and Business [Guest Lecture Videos].

Michael McCandless, Erik Hatcher, and Otis Gospodnetic, Lucene in Action, 2nd edition, Manning, 2010.

Toby Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O'Reilly, 2007.
Matthew Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, O'Reilly, 2011.
Satnam Alag, Collective Intelligence in Action , Manning, 2008.
Haralambos Marmanis and Dmitry Babenko, Algorithms of the Intelligent Web , Manning, 2009.

Ron Zacharski, A Programmer's Guide to Data Mining, Free Online eBook.

Hans Rosling: The Joy of Stats [Video].

Related Courses

Stanford Course CS276/LING286: Information Retrieval and Web Mining
Stuttgart Course: Introduction to Information Retrieval

MSU Course CSE484: Information Retrieval
Cornell Course CS430/INFO430: Information Retrieval
UNT Course CSCE5200: Information Retrieval and Web Search
UIUC Course CS410: Introduction to Text Information Systems (Spring 2008)
UIUC Course CS598: Integrative Intelligent Information Systems (Spring 2008)
UMass Course CS646: Information Retrieval
UCSC Course ISM260: Information Retrieval
UTexas Course CS 371R: Information Retrieval and Web Search
UPenn Course CIS 430: Introduction to Human Language Technology
PSU Course IST 441: Information Retrieval and Search Engines
UNC Course INLS 490-154: Introduction to Information Retrieval System Design and Implementation (Fall 2008)
IIT Course CS429: Introduction to Information Retrieval
Columbia Course COMS 6998: Search Engine Technology

Colorado Course CSCI 7000-001:Introduction to Information Retrieval
JHU Course 605.744: Information Retrieval (Spring 2009)
UCL Course M052: Information Retrieval


My Blog - Research on Search