Zheng Zhu (PhD)

Sessional Teacher
Birkbeck College, Malet Street, London, WC1E 7HX

Summary: I am working as a teacher on quantative method 1 for Management Department, Birkbeck. Previous, I worked on Migen Project. I had undertaken research into visualizing and mining data arising from learners' interactions in an online learning environment to support 11-14 year old students in the learning of algebraic generalization. Early 2011, I completed my PhD on Improving Search Engines via Classification, which was jointly supervised by Professor Mark Levene(Head of Department,Birbeck) and Professor Ingemar J Cox(IEEE Fellow, Director of Research, UCL).
Contact Details:
Address:
London Knowledge Lab
23-29 Emerald Street
London WC1N 3QS
Email: zheng (at) dcs (dot) bbk (dot) ac (dot) uk
Phone: +44 207 763 2115
Fax: +44 207 242 2754
IM&WT Group Information Management & Web Technologies
the london knowledge lab

Research

Information Retrieval on the web

Some Publications


Migen


My code

Neural Network, a nerual network algorithm for pairwise ranking (Matlab).

My other codes and notes for machine learning (Learning model from data)

Note there are many libraries available online for machine learning purpose, i.e., Pybrain for Python, Weka for Java, Mahout over hadoop. Here I only demo the fundamental concepts of machine learning.

Part 1. Linear Regression

Part 2. Logistic Regression

Part 3. Multiple Class Classification

Part 4. Neural Networks

Part 5. Bias and Variance

The error of ML algorithm comes from bias and variance. If the model is too simple(with very small parameter), it will cause bias or underfitting. In this case, even you add more training data, the performance will not get better. The training and testing error will be very close. If the model is too complicated (with too many paramter), it will lead to variance or overfitting. In this case, the model can fit the training data very well, but it can not generalize well. Therefore, there will be a gap between the training error and testing error. Whenever we find the cause of error, we can improve the algorithm, i.e., if it is bias, we need add more features or decrease the lambda which control the weights for regularization; if it is variance, we need do feature selection, adding more data or increase the lambda.

Bayesian Statistic

Bayesian theory is widely used in ML. We usually infer the paramter from the training data by maximum likelihood, and use the parameter to predict the test data. In bayesian framework, we can see the parameter as an "ensemble" and each model with different probability. If the prior and posteria distribution are in the same family, it is called conjugate. Some examples include Beta prior with Binomial likelihood, Gammar prior with Poisson likelihood, Gaussian prior with Gaussian likelihood. However, if it is nonconjugate priors, we have to use approximate algorithm. This leads to Monto Carlo Markov Chain. The procedure is as follows: The intuition is for r greater than 1, since theta_old is in our set, we should include theta* as it has a higher probability than theta*.
If r is less than 1, this means for every instance of theta_old, we should have a fraction of an instance of a theta* in our set.
Metropolis algorithm for Gaussian Distribution, One example of MCMC (R).

Gibbs Sampler

Gibbs Sampler is one special case of MCMC. In practice, it is often hard to obtain the joint posterios distribution, in particular for multi-parameter models. And it can be easy to sample from the full conditional distribution of each parameter. Full conditional distribution refers to the case data and parameters are observable except one parameter. One example is for a normal distribution, the prior distribution of mean is also a normal distribution and the prior distribution of precision is gamma distribution. Therefore given the data and precision, the posterior distribuion of mean is also normal distribution with mean is weighted mean from prior mean and data mean, precision comes from prior precision and data precision. If given the mean and data, the posterior distribution of precision is inverse gamma distribution. We can generate a new state of the parameters as follows:
1. Sample theta^{(s+1)}~p(theta|prec^s,y_1,...y_n);
2. Sample prec^{(s+1)}~p(prec|theta^{(s+1)},y1,...y_n);
3. Paramters {theta^{(s+1)},prec^{(s+1)}};
Gibbs Sampler example (R). Note Gibbs Sampler can be used to solve the topic modelling problem in practice.

Give me some credit One practical machine learning problem.

Interests


Languages


Resource


Hobby



Search

Loading...
Last updated by Z Zhu on 19 December 2011