Information Retrieval and Organisation

Tutors: Dell Zhang and Mark Levene
Time: Tuesday evenings 6pm - 9pm (Spring Term)
Room: SOP B42 [BBK-DCS Teaching Map]
Code: COIY064H7
Document: Module Specification


Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze,
Introduction to Information Retrieval,
Cambridge University Press, 2008.

Companion Website


Week Date Session I Session II
1 06/01/2015 Chapter 00
Chapter 01
Boolean Retrieval
[slides] [classwork-p]
2 13/01/2015 Chapter 02
The Term Vocabulary and Postings Lists
Chapter 03
Dictionaries and Tolerant Retrieval
[slides] [classwork-p]
3 20/01/2015 Chapter 19.6
Near-Duplicates and Shingling
[slides] [classwork-p]
Chapter 04
Index Construction
4 27/01/2015 Chapter 05
Index Compression
[slides] [classwork-p]
Chapter 06
Scoring, Term Weighting, and the Vector Space Model
[slides] [classwork-p] [example]
5 03/02/2015 Chapter 07
Computing Scores in a Complete Search System
Suffix Trees
[slides] [example]
6 10/02/2015 Chapter 08
Evaluation in Information Retrieval
[slides] [example]
Chapter 09
Relevance Feedback and Query Expansion
7 17/02/2015 *
A Brief Introduction to Probability and Statistics
[slides] [example]
Chapter 11
Probabilistic Information Retrieval
-- 22/02/2015 Coursework Part 1 - Submission Deadline
8 24/02/2015 Chapter 12
Language Models for Information Retrieval
[slides] [example]
Chapter 13
Text Classification & Naive Bayes
[slides] [example]
9 03/03/2015 Chapter 14
Vector Space Classification
[slides] [demo] [example]
Chapter 15
Support Vector Machines & Machine Learning on Documents
10 10/03/2015 Chapter 16
Flat Clustering
[slides] [demo] [example]
Chapter 17
Hierarchical Clustering
[slides] [example]
11 17/03/2015 Chapter 18
Matrix Decompositions & Latent Semantic Indexing
Advanced Topics in Information Retrieval
-- 29/03/2015 Coursework Part 2 - Submission Deadline
-- Tuesday
6pm - 9pm
Revision Lecture at MAL 251
Past Exam Paper 2008-09 2009-10 2010-11 2011-12 2012-13


Coursework: 20%
Part 1
Normal deadline: Sun 22/02/2015 23:55
Cut-off deadline: Sun 08/03/2015 23:55
Part 2
Normal deadline: Sun 29/03/2015 23:55
Cut-off deadline: Sun 12/04/2015 23:55
Penalty for late submission (i.e., after the normal deadline): the coursework mark will be capped at the minimum pass mark (i.e., 50% for MSc students).
Please submit your solutions in electronic form, through the Moodle system.

Examination: 80%
Past exam papers can be found at Birkbeck eLibrary.


Students committed to excellence are welcome to contact me for final project ideas.

Python Programming

Python [BPSN Course]

Information Retrieval Software

Apache Lucene
Terrier IR Platform
The Lemur Project
Python Package - Whoosh


Forsyth David and Ponce Jean: An Introduction to Probability.

Peter Norvig: How to Write a Spelling Corrector.
Peter Norvig: Natural Language Corpus Data, in Beautiful Data: The Stories Behind Elegant Data Solutions.
Paul Graham: A Plan for Spam.
Paul Graham: Better Bayesian Filtering.
Robert M. Bell et al.: The Million Dollar Programming Prize, IEEE Spectrum, May 2009.

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2010. (Chapter 22 Natural Language Processing)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edition, Addison Wesley, 2010.
Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, international edition, Pearson Education, 2009.
Stefan Buttcher, Charles Clarke, and Gordon Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
David Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd edition, Springer, 2004.

Jeffrey Dean: Challenges in Building Large-Scale Information Retrieval Systems (WSDM-2009 Keynote Speech). [VideoLecture]
UC Berkeley Course SIMS141: Search Engines: Technology, Society, and Business [Guest Lecture Videos].

Michael McCandless, Erik Hatcher, and Otis Gospodnetic, Lucene in Action, 2nd edition, Manning, 2010.

Toby Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O'Reilly, 2007.
Matthew Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, O'Reilly, 2011.
Satnam Alag, Collective Intelligence in Action , Manning, 2008.
Haralambos Marmanis and Dmitry Babenko, Algorithms of the Intelligent Web , Manning, 2009.

Ron Zacharski, A Programmer's Guide to Data Mining, Free Online eBook.

Hans Rosling: The Joy of Stats [Video].

Related Courses

Stanford Course CS276/LING286: Information Retrieval and Web Mining
Stuttgart Course: Introduction to Information Retrieval

MSU Course CSE484: Information Retrieval
Cornell Course CS430/INFO430: Information Retrieval
UNT Course CSCE5200: Information Retrieval and Web Search
UIUC Course CS410: Introduction to Text Information Systems (Spring 2008)
UIUC Course CS598: Integrative Intelligent Information Systems (Spring 2008)
UMass Course CS646: Information Retrieval
UCSC Course ISM260: Information Retrieval
UTexas Course CS 371R: Information Retrieval and Web Search
UPenn Course CIS 430: Introduction to Human Language Technology
PSU Course IST 441: Information Retrieval and Search Engines
UNC Course INLS 490-154: Introduction to Information Retrieval System Design and Implementation (Fall 2008)
IIT Course CS429: Introduction to Information Retrieval
Columbia Course COMS 6998: Search Engine Technology

Colorado Course CSCI 7000-001:Introduction to Information Retrieval
JHU Course 605.744: Information Retrieval (Spring 2009)
UCL Course M052: Information Retrieval


My Blog - Research on Search