Collaborative media and social networks

Finding and Summarizing Answers from Online Communities
Thursday, 12 August 2010



Long Chen

This e-mail address is being protected from spam bots, you need JavaScript enabled to view it


Dell Zhang

Mark Levene


Project Details

PhD research, started 10/2009 expected to finish 09/2012





Information Retrieval,

Web Minin  


Finding and Summarizing Answers from Online Communities

Project Aims
Traditional question answering (QA) systems only deal with fixed corpuses, factual questions, and short answers. The aim of this project is to overcome the above limitations and develop a scalable automated QA system by harnessing the power of human knowledge embedded in online communities.

Key themes
Our approach is to collect, extract, index, and mine the huge number of question-answer pairs from fast-growing user-generated content on Web 2.0 sites (such as Yahoo! Answers), and to distil them by using multi-document text summarization techniques.

Application of the Research
This research aspires to be useful in serving as an auxiliary component for current search engine systems, leading to a new means of Web information retrieval that complements or even rivals keyword-based search engines such as Google.
Since the approach of question retrieval is also applicable to other problems too, our proposed research method aims to be, in turn, broad enough to facilitate additional domains such as text retrieval and text summarization.

Results to Date
In order to improve the performance of question retrieval, we have carried out some experiments using the iterative k-nearest neighbours model for question classification and the dynamic edit-distance approach for the question ranking, which come from two different perspectives, with the purpose of improving the accuracy of question retrieval.

We have also undertaken some experiments to verify the effectiveness of a novel Wikipedia-based language modelling approach to question retrieval. Our experimental results based on a generic text sample indicate that this approach shows a significant improvement over the traditional two-stage language modelling approach [1] as well as the translation-based language modelling approach [2].

Next steps
Since Web 2.0 has increasingly taken hold during this new era, we plan to integrate more Web 2.0 ingredients, other than Wikipedia, into question retrieval in the coming year so as to investigate further the power of on-line collaboration.


1. Zhai, C. and Lafferty, J. 2002. Two-stage language models for information retrieval. In Proceedings of the 25th Annual international ACM SIGIR Conference on Research and Development in information Retrieval. SIGIR '02. ACM, New York, NY, 49-56.
2. Berger and J. Lafferty, Information Retrieval as Statistical Translation, Proc. 22nd Ann. Int'l ACM Conf. Research and Development in Information Retrieval (SIGIR '99), pp. 222-229 (1999)