Flexible Querying and Integration of Linked Data
Wednesday, 25 February 2015

Project Staff 

Andrea Cali

Alex Poulovassilis

Peter Wood

Research Students

Mirko Dimartino

Riccardo Frosini

Petra Selmer

Other Project Partners

George Fletcher, Eindhoven

Project funding

JISC, Royal Society

Project Sheet

 

Research Challenges

The volume of graph-structured data on the web continues to grow, most recently in the form of RDF Linked Data. Such data can be complex, heterogeneous and evolving, and users may lack full knowledge of its structure, its irregularities and the URIs used within it. This can make it difficult for users to formulate queries that precisely express their information retrieval requirements. To date, languages for querying graph-structured data have provided limited capabilities for flexible querying, with no ability to generate or rank approximate answers to queries. There has also been limited work in the semantic integration of heterogeneous Linked Data in a dynamic environment that encompasses multiple data sources with arbitrary mapping topologies between them.

Research Approaches

We have designed and implemented extensions to support query relaxation and query approximation in Conjunctive Regular Path queries and in SPARQL 1.1. One application of these techniques is in the L4All system, which allows users to create and maintain a record of their personal learning and work experiences, structured in the form of a timeline - see Fig. 1. Flexible search is provided over this information, with the aim of supporting collaborative formulation of future learning goals and aspirations - see left.

A major challenge in dealing with web-scale linked data sets is in query processing performance. In ongoing research we are designing query optimisation techniques based on the construction and use of path indexes and the development of new cost models. The challenges become even greater when considering flexible query processing, and for this we are exploring query minimisation and multi-query optimisation techniques.

Finally, for supporting semantic integration and users' transparent querying of multiple data sources, we have designed a decentralised, extensible RDF-oriented peer data management system, together with a new peer mapping language suitable for the RDF data model, and new query answering and query rewriting techniques for graph-pattern queries.

Selected Publications

Flexible Querying of Lifelong Learner Metadata. A. Poulovassilis, P. Selmer, P.T. Wood. IEEE Trans. on Learning Technologies, 5(2), pp 117-129, 2012.
A Structural Approach to Indexing Triples. F. Picalausa, Y. Luo, G.H.L. Fletcher, J. Hidders, S. Vansummeren. ESWC 2012, pp 406-421.
Flexible Querying for SPARQL. A. Calì, R. Frosini, A. Poulovassilis, P.T. Wood. OTM Conferences 2014, pp 473-490.
Implementing Flexible Operators for Regular Path Queries. P. Selmer, A. Poulovassilis, P.T. Wood. To appear in Proc. GraphQ 2015, March, 2015.
Peer-to-Peer Semantic Integration of Linked Data. M.M. Dimartino, A. Calì, A. Poulovassilis, P.T. Wood. To appear in Proc. LWDM 2015, March 2015.

fig1flex.png

 


Figure 1. Example of graph-structured data: a fragment of Al's timeline, comprising episodes of learning and work, and associated metadata.