Skip to content Search
Search our website:

Information Retrieval and Organisation

Short name: IRO
SITS code: COIY064H7
Credits: 15 credits
Level: 7
Module leader: Dell Zhang
Lecturer(s): Dell Zhang

Module outline

Due to the explosive growth of digital information in recent years, modern Natural Language Processing (NLP) and Information Retrieval (IR) systems such as search engines have become more and more important in almost everyone's work and life (e.g. see the phenomenal rise of Google). NLP & IR research and development are one of the hottest research areas in academia as well as industry. This module will convey the basic principles of modern NLP & IR systems to students.

Aims

The aim of this module is to introduce modern NLP & IR concepts and techniques, from basic text indexing to advanced text analysis. Both theoretical and practical aspects of NLP & IR systems will be presented and the most recent issues in the field of NLP & IR will be discussed. This will give students an insight into how modern search engines work and are developed.

Syllabus

  • Boolean Retrieval
  • The Term Vocabulary and Postings Lists
  • Regular Expressions and Text Normalization
  • Dictionaries and Tolerant Retrieval
  • Edit Distance
  • Index Compression
  • Scoring, Term Weighting and the Vector Space Model
  • Evaluation in Information Retrieval
  • Probabilistic Information Retrieval
  • Language Models for Information Retrieval
  • Language Modeling with N-Grams
  • Spelling Correction and the Noisy Channel
  • Text Classification, Naive Bayes, and Sentiment Analysis
  • Vector Space Classification
  • Logistic Regression
  • Matrix Decompositions and Latent Semantic Indexing
  • Vector Semantics
  • Neural Nets and Neural Language Models
  • Sequence Processing with Recurrent Networks

Prerequisites

none

Timetable

All dates and timetables are listed in the programme handbooks of individual programmes.

Coursework

The coursework includes two assignments.

Assessment

Coursework (20%). Examination (80%).

Recommended reading