Skip to content Search
Search our website:

Information Retrieval and Organisation

Short name: IRO
SITS code: COIY064H7
Credits: 15 credits
Level: 7
Module leader: Dell Zhang
Lecturer(s): Dell Zhang

Module outline

Due to the explosive growth of digital information in recent years, modern Information Retrieval (IR) systems such as search engines have become more and more important in almost everyone's work and life (e.g. see the phenomenal rise of Google). IR research and development are one of the hottest research areas in academia as well as industry. This module will convey the basic principles of modern IR systems to students.

Aims

The aim of this module is to introduce modern Information Retrieval (IR) concepts and techniques, from basic text indexing to advanced text mining and Web IR. Both theoretical and practical aspects of IR systems will be presented and the most recent issues in the field of IR will be discussed. This will give students an insight into how modern search engines work and are developed.

Syllabus

  • Boolean Retrieval
  • The Term Vocabulary and Postings Lists
  • Regular Expressions and Text Normalization
  • Dictionaries and Tolerant Retrieval
  • Edit Distance
  • Index Compression
  • Scoring, Term Weighting and the Vector Space Model
  • Evaluation in Information Retrieval
  • Probabilistic Information Retrieval
  • Language Models for Information Retrieval
  • Language Modeling with N-Grams
  • Spelling Correction and the Noisy Channel
  • Text Classification
  • Naive Bayes
  • Sentiment Classification
  • Vector Space Classification
  • Flat Clustering
  • Hierarchical Clustering
  • Vector Semantics
  • Semantics with Dense Vectors
  • Matrix Decompositions & Latent Semantic Indexing

Prerequisites

none

Timetable

All dates and timetables are listed in the programme handbooks of individual programmes.

Coursework

The coursework includes two assignments.

Assessment

Coursework (20%). Examination (80%).

Recommended reading