Natural Language Processing and Information Retrieval

Module Tutor: Dell Zhang
Time: Tuesday evenings 6pm - 9pm (Spring Term)
Room: UCL 1-19 Torrington Place, Room B17
Code: COIY064H7
Teaching Assistant: Cosmin Stamate


Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze.
Introduction to Information Retrieval.
Cambridge University Press, 2008.

Companion Website
Dan Jurafsky and James H. Martin.
Speech and Language Processing, 2nd edition,
Pearson, 2008.

Companion Website (3rd edition draft)


Week Date Session I Session II
1 15/01/2019 [IIR-00] [SLP-01]
Boolean Retrieval
[slides] [classwork-p] [classwork-s]
2 22/01/2019 [IIR-02]
The Term Vocabulary and Postings Lists
Regular Expressions and Text Normalization
3 29/01/2019 [IIR-03]
Dictionaries and Tolerant Retrieval
[slides] [classwork-p] [classwork-s]
Edit Distance
4 05/02/2019 [IIR-05]
Index Compression
[slides] [classwork-p] [classwork-s]
Scoring, Term Weighting, and the Vector Space Model
[slides] [classwork-p] [classwork-s] [example]
5 12/02/2019 [IIR-08]
Evaluation in Information Retrieval
[slides] [example]
Probabilistic Information Retrieval
[slides] [example]
6 19/02/2019 Reading Week: No Lecture for All Students.
Please find below the materials to read.

Lexicons for Sentiment, Affect, and Connotation (1/2)
-- 24/02/2019 Coursework Part 1 - Submission Deadline
7 26/02/2019 [IIR-12]
Language Models for Information Retrieval
[slides] [example]
Language Modeling with N-Grams
8 05/03/2019 [SLP-B]
Spelling Correction and the Noisy Channel
[IIR-13] [SLP-04]
Text Classification, Naive Bayes, and Sentiment Analysis
[slides] [slides] [slides] [example]
9 12/03/2019 [IIR-14]
Vector Space Classification
[slides] [demo] [example]
Logistic Regression
10 19/03/2019 [IIR-18]
Matrix Decompositions & Latent Semantic Indexing
Vector Semantics
[slides] [slides]
11 26/03/2019 [SLP-07]
Neural Nets and Neural Language Models
Sequence Processing with Recurrent Networks
-- 07/04/2019 Coursework Part 2 - Submission Deadline
-- Tuesday
6pm - 9pm
Revision Lecture at MAL G16
Past Exam Paper 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16 2016-17 2017-18
-- ---------- [IIR-04]
Index Construction
Computing Scores in a Complete Search System
-- ---------- [IIR-09]
Relevance Feedback and Query Expansion
XML Retrieval
-- ---------- [IIR-16] [IIR-07.1.6]
Flat Clustering
[slides] [demo] [example]
Hierarchical Clustering
[slides] [example]
-- ---------- [SLP-C]
Computing with Word Senses: WSD and WordNet
[slides] [slides]
Lexicons for Sentiment, Affect, and Connotation
-- ---------- [IIR-15]
Support Vector Machines & Machine Learning on Documents
Near-Duplicates and Shingling
[slides] [classwork-p] [classwork-s]
-- ---------- [Gusfield1997]
Suffix Trees
[slides] [example]
Probabilistic Topic Models


Coursework: 20%
Part 1
Normal deadline: Sun 24/02/2019 23:55
Cut-off deadline: Sun 10/03/2019 23:55
Part 2
Normal deadline: Sun 07/04/2019 23:55
Cut-off deadline: Sun 21/04/2019 23:55
Penalty for late submission (i.e., after the normal deadline): the coursework mark will be capped at the minimum pass mark (i.e., 50% for MSc students).
Please submit your solutions in electronic form, through the Moodle system.

Examination: 80%
Past exam papers can be found at Birkbeck eLibrary.


MSc students committed to excellence are welcome to contact me for project ideas.

Python Programming

Python [A Short Course for BGRS and BPSN]

Information Retrieval Software

Apache Lucene
Terrier IR Platform
The Lemur Project
Python Package - Whoosh


Forsyth David and Ponce Jean: An Introduction to Probability.

Peter Norvig: How to Write a Spelling Corrector.
Peter Norvig: Natural Language Corpus Data, in Beautiful Data: The Stories Behind Elegant Data Solutions.
Paul Graham: A Plan for Spam.
Paul Graham: Better Bayesian Filtering.
Robert M. Bell et al.: The Million Dollar Programming Prize, IEEE Spectrum, May 2009.

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2010. (Chapter 22 Natural Language Processing)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edition, Addison Wesley, 2010.
Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, international edition, Pearson Education, 2009.
Stefan Buttcher, Charles Clarke, and Gordon Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
David Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd edition, Springer, 2004.

Jeffrey Dean: Challenges in Building Large-Scale Information Retrieval Systems (WSDM-2009 Keynote Speech). [VideoLecture]
UC Berkeley Course SIMS141: Search Engines: Technology, Society, and Business [Guest Lecture Videos].

Michael McCandless, Erik Hatcher, and Otis Gospodnetic, Lucene in Action, 2nd edition, Manning, 2010.

Toby Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O'Reilly, 2007.
Matthew Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, O'Reilly, 2011.
Satnam Alag, Collective Intelligence in Action , Manning, 2008.
Haralambos Marmanis and Dmitry Babenko, Algorithms of the Intelligent Web , Manning, 2009.

Ron Zacharski, A Programmer's Guide to Data Mining, Free Online eBook.

Hans Rosling: The Joy of Stats [Video].

Related Courses

Stanford Course CS276/LING286: Information Retrieval and Web Mining
Stuttgart Course: Introduction to Information Retrieval

MSU Course CSE484: Information Retrieval
Cornell Course CS430/INFO430: Information Retrieval
UNT Course CSCE5200: Information Retrieval and Web Search
UIUC Course CS410: Introduction to Text Information Systems (Spring 2008)
UIUC Course CS598: Integrative Intelligent Information Systems (Spring 2008)
UMass Course CS646: Information Retrieval
UCSC Course ISM260: Information Retrieval
UTexas Course CS 371R: Information Retrieval and Web Search
UPenn Course CIS 430: Introduction to Human Language Technology
PSU Course IST 441: Information Retrieval and Search Engines
UNC Course INLS 490-154: Introduction to Information Retrieval System Design and Implementation (Fall 2008)
IIT Course CS429: Introduction to Information Retrieval
Columbia Course COMS 6998: Search Engine Technology

Colorado Course CSCI 7000-001:Introduction to Information Retrieval
JHU Course 605.744: Information Retrieval (Spring 2009)
UCL Course M052: Information Retrieval


My Blog - Research on Search