Hierarchical Bayesian Models of Language and Text
- Speaker: Dr Yee Whye Teh, Lecturer, Gatsby Computational Neuroscience Unit, University College London
- Date: Tuesday, 24 May 2011 from 16:45 to 17:45
- Location: Room 745, Malet Street
In this talk I will present a new approach to modelling sequence data called the sequence memoizer. As opposed to most other sequence models, our model does not make any Markovian assumptions. Instead, we use a hierarchical Bayesian approach which enforces sharing of statistical strength across the different parts of the model. To make computations with the model efficient, and to better model the power-law statistics often observed in sequence data, we use a Bayesian nonparametric prior called the Pitman-Yor process as building blocks in the hierarchical model. We show state-of-the-art results on language modelling and text compression. This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James.