Information Retrieval and Organisation
Due to the explosive growth of digital information in recent years, modern Information Retrieval (IR) systems such as search engines have become more and more important in almost everyone's work and life (e.g. see the phenomenal rise of Google). IR research and development are one of the hottest research areas in academia as well as industry. This module will convey the basic principles of modern IR systems to students.
The aim of this module is to introduce modern Information Retrieval (IR) concepts and techniques, from basic text indexing to advanced text mining and Web IR. Both theoretical and practical aspects of IR systems will be presented and the most recent issues in the field of IR will be discussed. This will give students an insight into how modern search engines work and are developed.
- Boolean Retrieval
- The Term Vocabulary and Postings Lists
- Regular Expressions and Text Normalization
- Dictionaries and Tolerant Retrieval
- Edit Distance
- Index Compression
- Scoring, Term Weighting and the Vector Space Model
- Evaluation in Information Retrieval
- Probabilistic Information Retrieval
- Language Models for Information Retrieval
- Language Modeling with N-Grams
- Spelling Correction and the Noisy Channel
- Text Classification
- Naive Bayes
- Sentiment Classification
- Vector Space Classification
- Flat Clustering
- Hierarchical Clustering
- Vector Semantics
- Semantics with Dense Vectors
- Matrix Decompositions & Latent Semantic Indexing
All dates and timetables are listed in the programme handbooks of individual programmes.
The coursework includes two assignments.
Coursework (20%). Examination (80%).
- Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.
- Dan Jurafsky and James H. Martin, Speech and Language Processing, 3rd ed draft.