Information Retrieval and Organisation

Tutors: Dell Zhang and Mark Levene
Time: Tuesday evenings 6pm - 9pm (Spring Term)
Room: Stewart House (STB) 9 [BBK-DCS Teaching Map]
Code: COIY064H7
Document: Module Specification


Textbook

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze,
Introduction to Information Retrieval,
Cambridge University Press, 2008.

Companion Website

Syllabus

Week Date Session I Session II
1 07/01/2014 Chapter 00
Motivation
[slides]
Chapter 01
Boolean Retrieval
[slides] [classwork-p] [classwork-s]
2 14/01/2014 Chapter 02
The Term Vocabulary and Postings Lists
[slides]
Chapter 03
Dictionaries and Tolerant Retrieval
[slides] [classwork-p] [classwork-s]
3 21/01/2014 Chapter 04
Index Construction
[slides]
Chapter 05
Index Compression
[slides] [classwork-p] [classwork-s]
4 28/01/2014 Chapter 06
Scoring, Term Weighting, and the Vector Space Model
[slides] [classwork-p] [classwork-s] [example]
Chapter 07
Computing Scores in a Complete Search System
[slides]
5 04/02/2014 [Gusfield1997]
Suffix Trees
[slides] [example]
Chapter 08
Evaluation in Information Retrieval
[slides] [example]
6 11/02/2014 Chapter 09
Relevance Feedback and Query Expansion
[slides]
Chapter 09
Relevance Feedback and Query Expansion
[slides]
7 18/02/2014 *
A Brief Introduction to Probability and Statistics
[slides] [example]
Chapter 11
Probabilistic Information Retrieval
[slides]
-- 23/02/2014 Coursework Part 1 - Submission Deadline
8 25/02/2014 Chapter 12
Language Models for Information Retrieval
[slides] [example]
Chapter 13
Text Classification & Naive Bayes
[slides] [example]
9 04/03/2014 Chapter 14
Vector Space Classification
[slides] [demo] [example]
Chapter 15
Support Vector Machines & Machine Learning on Documents
[slides]
10 11/03/2014 Chapter 16
Flat Clustering
[slides] [demo] [example]
Chapter 17
Hierarchical Clustering
[slides] [example]
11 18/03/2014 Chapter 18
Matrix Decompositions & Latent Semantic Indexing
[slides]
*
Advanced Topics in Information Retrieval
[slides]
-- 30/03/2014 Coursework Part 2 - Submission Deadline
-- Tuesday
13/05/2014
6pm - 9pm
Revision Lecture at MAL 251

Assessment

Coursework: 20%
Part 1
Normal deadline: Sun 23/02/2014 23:55
Cut-off deadline: Sun 09/03/2014 23:55
Part 2
Normal deadline: Sun 30/03/2014 23:55
Cut-off deadline: Sun 13/04/2014 23:55
Penalty for late submission (i.e., after the normal deadline): the coursework mark will be capped at the minimum pass mark (i.e., 50% for MSc students).
Please submit your solutions in electronic form, through the Moodle system.

Examination: 80%
Past exam papers can be found at Birkbeck eLibrary.

Projects

Students committed to excellence are welcome to contact me for final project ideas.

Python Programming

Python [BPSN Course]

Information Retrieval Software

Apache Lucene
Terrier IR Platform
The Lemur Project
Python Package - Whoosh

Supplements

Forsyth David and Ponce Jean: An Introduction to Probability.

Peter Norvig: How to Write a Spelling Corrector.
Peter Norvig: Natural Language Corpus Data, in Beautiful Data: The Stories Behind Elegant Data Solutions.
Paul Graham: A Plan for Spam.
Paul Graham: Better Bayesian Filtering.
Robert M. Bell et al.: The Million Dollar Programming Prize, IEEE Spectrum, May 2009.

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2010. (Chapter 22 Natural Language Processing)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edition, Addison Wesley, 2010.
Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, international edition, Pearson Education, 2009.
Stefan Buttcher, Charles Clarke, and Gordon Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
David Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd edition, Springer, 2004.

Jeffrey Dean: Challenges in Building Large-Scale Information Retrieval Systems (WSDM-2009 Keynote Speech). [VideoLecture]
UC Berkeley Course SIMS141: Search Engines: Technology, Society, and Business [Guest Lecture Videos].

Michael McCandless, Erik Hatcher, and Otis Gospodnetic, Lucene in Action, 2nd edition, Manning, 2010.

Toby Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O'Reilly, 2007.
Matthew Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, O'Reilly, 2011.
Satnam Alag, Collective Intelligence in Action , Manning, 2008.
Haralambos Marmanis and Dmitry Babenko, Algorithms of the Intelligent Web , Manning, 2009.

Ron Zacharski, A Programmer's Guide to Data Mining, Free Online eBook.

Hans Rosling: The Joy of Stats [Video].

Related Courses

Stanford Course CS276/LING286: Information Retrieval and Web Mining
Stuttgart Course: Introduction to Information Retrieval

MSU Course CSE484: Information Retrieval
Cornell Course CS430/INFO430: Information Retrieval
UNT Course CSCE5200: Information Retrieval and Web Search
UIUC Course CS410: Introduction to Text Information Systems (Spring 2008)
UIUC Course CS598: Integrative Intelligent Information Systems (Spring 2008)
UMass Course CS646: Information Retrieval
UCSC Course ISM260: Information Retrieval
UTexas Course CS 371R: Information Retrieval and Web Search
UPenn Course CIS 430: Introduction to Human Language Technology
PSU Course IST 441: Information Retrieval and Search Engines
UNC Course INLS 490-154: Introduction to Information Retrieval System Design and Implementation (Fall 2008)
IIT Course CS429: Introduction to Information Retrieval
Columbia Course COMS 6998: Search Engine Technology

Colorado Course CSCI 7000-001:Introduction to Information Retrieval
JHU Course 605.744: Information Retrieval (Spring 2009)
UCL Course M052: Information Retrieval

Links

My Blog - Research on Search


Google
 
Web www.dcs.bbk.ac.uk