Project Outline and Aims

Semi-structured data is usually modelled as a graph structure, with labelled edges. The original aim of this project was to investigate algorithms for answering path queries on such graph structures. A crucial assumption was that the queries asked about nodes connected by simple paths, that is, ones in which no node is repeated. This assumption caused query answering to become intractable, with the result that the project focussed on discovering various classes of queries and graphs for which query answering could be guaranteed to be performed in time polynomial in the size of the graph being queried. Our early work on finding regular simple paths has received over 200 citations, according to Google scholar.

More recently, the project has been concerned with flexible querying of semi-structured data. Flexible querying allows users to request that conditions in queries be relaxed to allow more general answers to be returned, ranked in terms of how closely they match the original query. This is useful in areas where users are not familiar with the structure of the data or where they want to browse the data in an exploratory manner. The project initially focussed on RDF data, but now more general solutions for semi-structured data are being investigated.


Funding and Staffing Details

The initial project ran until 1999 and was undertaken by Peter Wood, Alberto Mendelzon and Zhivko Nedev. Early work on the project, not listed in the publications below, evolved into the Hy+ project at the University of Toronto, with many additional contributors. Funding for Peter Wood's work was provided by the Foundation for Research Development (FRD), now the National Research Foundation, of South Africa.

The aspect of flexible querying is being investigated by Peter Wood, Alex Poulovassilis, Carlos Hurtado and Pablo Barcelo. The project was funded by the Royal Society between 2007 and 2009. Petra Selmer is working in the area for her PhD.


Project publications