Skip to content Search
Search our website:

Hierarchical Bayesian Models of Language and Text

  • Speaker: Dr Yee Whye Teh, Lecturer, Gatsby Computational Neuroscience Unit, University College London
  • Date: Tuesday, 24 May 2011 from 16:45 to 17:45
  • Location: Room 745, Malet Street

In this talk I will present a new approach to modelling sequence data called the sequence memoizer. As opposed to most other sequence models, our model does not make any Markovian assumptions. Instead, we use a hierarchical Bayesian approach which enforces sharing of statistical strength across the different parts of the model. To make computations with the model efficient, and to better model the power-law statistics often observed in sequence data, we use a Bayesian nonparametric prior called the Pitman-Yor process as building blocks in the hierarchical model. We show state-of-the-art results on language modelling and text compression. This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James.