Flexible Querying of Semi-Structured Data
Friday, 20 August 2010



Petra Selmer

This e-mail address is being protected from spam bots, you need JavaScript enabled to view it



Alex Poulovassilis

Peter Wood

  Project Details

PhD research

Started October 2008, expected to finish October 2014


Flexible Queries, Ranking Answers, Query Approximation, Query Relaxation, Semantic Web, Semi-structured Data

Project aims

Current proposals for languages to query semi-structured data provide only limited capabilities for flexible querying, with no ability to rank the answers for users.

 Our research involves the investigation and development of techniques for enabling users to query semi-structured data in a flexible fashion. This is achieved by allowing the user to specify various approximation and relaxation operations on the conditions of a query, which will subsequently allow query results to be returned ranked in terms of how closely they match the original query.

 Application areas

The outcomes from this research will be useful in domains where users may not be familiar with the structure of the data or where they may want to browse the data in an exploratory manner. One application which is currently being investigated as a case study is the L4All system. This system allows users to create and maintain a record of their personal learning and work experiences to date (visualised in the form of a timeline), as well as their future learning and career aspirations. Users can search over this information, with the aim of supporting collaborative formulation of future learning goals and aspirations.

Even though L4All users are able to pose queries for finding relevant timelines and the learning and work episodes within them, the flexibility of the querying mechanisms provided by the system is limited.  The case study aims to extend L4All by allowing users to specify approximations and relaxations to be applied to their initial search query. Query results will then be returned incrementally, ranked in order of increasing "edit distance" from the original query.

An example

To illustrate the principle of query approximation with an example, assume we have the data below.



Jane's timeline, where "next" indicates the sequencing of successive episodes in the timeline and "prereq" indicates that Jane has stated that undertaking an earlier episode was necessary in order for her to be able to proceed to a later episode

Tom might pose a query asking which jobs have an "English Studies" degree as a "prereq" (prerequisite). Without query approximation, no results from Jane's timeline would be returned, even though it is clear that this timeline would be of interest to Tom.

However, some answers can be returned by applying query approximation to Tom's query: By replacing "prereq" in the query by "next" - with an edit cost of 1 - the answer Air Travel Assistant would be returned (from episode ep2). By replacing "prereq" by "next" and inserting a second "next" - at a combined edit cost of 2 - the answer Journalist would be returned. By inserting "next" twice in front of "prereq" - also at a combined cost of 2 - the answer Assistant Editor would be returned.


 23-29 Emerald Street, London WC1N 3QS    phone: +44 (0)20 7763 2174     www.lkl.ac.uk