Natural Language Processing and Information Retrieval

Module Tutor: Dell Zhang
Time: Tuesday evenings 6pm - 9pm (Spring Term)
Venue: Online (Blackboard Collaborate) on Moodle
Code: COIY064H7
Teaching Assistant: Ehshan Veerabangsa (e.veerabangsa@bbk.ac.uk) - Coursework Marking & Asynchronous Support


Textbook

Dan Jurafsky and James H. Martin.
Speech and Language Processing, 2nd edition,
Pearson, 2008.

Companion Website (3rd edition draft)
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze.
Introduction to Information Retrieval.
Cambridge University Press, 2008.

Companion Website

Syllabus

Week Date Session 1 Session 2
1 11/01/2022 [IIR-00] [SLP-01]
Motivation
[slides]
[IIR-01]
Boolean Retrieval
[slides] [classwork-p] [classwork-s]
2 18/01/2022 [IIR-02]
The Term Vocabulary and Postings Lists
[slides]
[SLP-02] [SLP-08a]
Regular Expressions, Text Normalization, Sequence Labeling
[slides] [slides]
3 25/01/2022 [IIR-03]
Dictionaries and Tolerant Retrieval
[slides] [classwork-p] [classwork-s]
[SLP-02]
Edit Distance
[slides]
4 01/02/2022 [IIR-05]
Index Compression
[slides] [classwork-p] [classwork-s]
[IIR-06]
Scoring, Term Weighting, and the Vector Space Model
[slides] [classwork-p] [classwork-s] [example]
5 08/02/2022 [IIR-08]
Evaluation in Information Retrieval
[slides] [example]
[IIR-11]
Probabilistic Information Retrieval
[slides] [example]
6 15/02/2022 Reading Week: No Lecture for All Students.
Please find below the materials to read.
[SLP-20a]
Lexicons for Sentiment, Affect, and Connotation (1/2)
[slides]
[SLP-24]
Chatbots and Dialogue Systems
[slides]
-- 20/02/2022 Coursework Part 1 - Submission Deadline
7 22/02/2022 [IIR-12]
Language Models for Information Retrieval
[slides] [example]
[SLP-03a]
Language Modeling with N-Grams
[slides]
8 01/03/2022 [SLP-B]
Spelling Correction and the Noisy Channel
[slides]
[IIR-13] [SLP-04]
Text Classification, Naive Bayes, and Sentiment Analysis
[slides] [slides] [slides] [example]
9 08/03/2022 [IIR-14]
Vector Space Classification
[slides] [demo] [example]
[SLP-05]
Logistic Regression
[slides]
10 15/03/2022 [IIR-18]
Matrix Decompositions and Latent Semantic Indexing
[slides] [article]
[SLP-06]
Vector Semantics
[slides] [slides] [slides]
11 22/03/2022 [SLP-07]
Neural Nets and Neural Language Models
[slides]
[SLP-09]
Deep Learning Architectures for Sequence Processing
[slides]
-- 03/04/2022 Coursework Part 2 - Submission Deadline
-- Tuesday
03/05/2022
6pm - 9pm
Revision Lecture
[slides]
Past Exam Paper
2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16 2016-17 2017-18 2018-19 2019-20 2020-21
-- ---------- [IIR-04]
Index Construction
[slides]
[IIR-07]
Computing Scores in a Complete Search System
[slides]
-- ---------- [IIR-09]
Relevance Feedback and Query Expansion
[slides]
[IIR-10]
XML Retrieval
[slides]
-- ---------- [IIR-16] [IIR-07.1.6]
Flat Clustering
[slides] [demo] [example]
[IIR-17]
Hierarchical Clustering
[slides] [example]
-- ---------- [IIR-15]
Support Vector Machines & Machine Learning on Documents
[slides]
[IIR-19.6]
Near-Duplicates and Shingling
[slides] [classwork-p] [classwork-s]
-- ---------- [Gusfield1997]
Suffix Trees
[slides] [example]
[CACM12-04-blei]
Probabilistic Topic Models
[slides]

Assessment

Coursework: 20%
Part 1 [Reassessment]
Normal deadline: Fri 05/08/2022 13:00
Cut-off deadline: Fri 19/08/2022 13:00
Part 2 [Reassessment]
Normal deadline: Fri 05/08/2022 13:00
Cut-off deadline: Fri 19/08/2022 13:00
Please submit your solutions as a PDF file through Moodle.

Examination: 80%
Past exam papers can be found at Birkbeck eLibrary.

Projects

MSc students committed to excellence are welcome to contact me for project ideas.

Python Programming

Python [A Short Course for BGRS and BPSN]

Information Retrieval Software

Apache Lucene
Terrier IR Platform
The Lemur Project
Python Package - Whoosh

Supplements

Forsyth David and Ponce Jean: An Introduction to Probability.

Peter Norvig: How to Write a Spelling Corrector.
Peter Norvig: Natural Language Corpus Data, in Beautiful Data: The Stories Behind Elegant Data Solutions.
Paul Graham: A Plan for Spam.
Paul Graham: Better Bayesian Filtering.
Robert M. Bell et al.: The Million Dollar Programming Prize, IEEE Spectrum, May 2009.

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd edition, Prentice Hall, 2010. (Chapter 22 Natural Language Processing)
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edition, Addison Wesley, 2010.
Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, international edition, Pearson Education, 2009.
Stefan Buttcher, Charles Clarke, and Gordon Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
David Grossman and Ophir Frieder, Information Retrieval: Algorithms and Heuristics, 2nd edition, Springer, 2004.

Jeffrey Dean: Challenges in Building Large-Scale Information Retrieval Systems (WSDM-2009 Keynote Speech). [VideoLecture]
UC Berkeley Course SIMS141: Search Engines: Technology, Society, and Business [Guest Lecture Videos].

Michael McCandless, Erik Hatcher, and Otis Gospodnetic, Lucene in Action, 2nd edition, Manning, 2010.

Toby Segaran, Programming Collective Intelligence: Building Smart Web 2.0 Applications, O'Reilly, 2007.
Matthew Russell, Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, O'Reilly, 2011.
Satnam Alag, Collective Intelligence in Action , Manning, 2008.
Haralambos Marmanis and Dmitry Babenko, Algorithms of the Intelligent Web , Manning, 2009.

Ron Zacharski, A Programmer's Guide to Data Mining, Free Online eBook.

Hans Rosling: The Joy of Stats [Video].

Related Courses

Stanford Course CS276/LING286: Information Retrieval and Web Mining
Stuttgart Course: Introduction to Information Retrieval

MSU Course CSE484: Information Retrieval
Cornell Course CS430/INFO430: Information Retrieval
UNT Course CSCE5200: Information Retrieval and Web Search
UIUC Course CS410: Introduction to Text Information Systems (Spring 2008)
UIUC Course CS598: Integrative Intelligent Information Systems (Spring 2008)
UMass Course CS646: Information Retrieval
UCSC Course ISM260: Information Retrieval
UTexas Course CS 371R: Information Retrieval and Web Search
UPenn Course CIS 430: Introduction to Human Language Technology
PSU Course IST 441: Information Retrieval and Search Engines
UNC Course INLS 490-154: Introduction to Information Retrieval System Design and Implementation (Fall 2008)
IIT Course CS429: Introduction to Information Retrieval
Columbia Course COMS 6998: Search Engine Technology

Colorado Course CSCI 7000-001:Introduction to Information Retrieval
JHU Course 605.744: Information Retrieval (Spring 2009)
UCL Course M052: Information Retrieval

Links

My Blog - Research on Search


Google
 
Web www.dcs.bbk.ac.uk