Quality-driven Heterogenous Data Integration
Thursday, 24 June 2010



Jianing Wang


Alex Poulovassilis
Nigel Martin

Quality-driven Heterogenous Data Integration

Key themes
The idea behind this project is to develop techniques and tools for assisting in the creation of high quality integrated resources from a set of heterogeneous data sources. Integrating such data sources can be a complex and error-prone process due to factors such as: syntactic and semantic differences in the data sources, the need to meet different requirements arising from different users of the integrated resource that will be produced, and the need to incorporate the users' domain knowledge into the integrated resource.††

Project Aims
The aims of this project include: (i) Eliciting end-users' and integrators' requirements regarding quality-driven data integration (DI), (ii) identification of appropriate quality criteria for DI settings, (iii) development of metrics and measurement methods for these quality criteria, and the trade-offs between them, and (iv) identification of formal representation and reasoning methods in order to produce an integrated and consistent quality view of a DI setting.

Results to Date
So far, a quality framework has been defined (see Figure 1) containing four major aspects - item, metric, quality criteria and user - and the relationships between related concepts. Quality criteria in the context of DI have been categorised into completeness, consistency, accuracy, minimality and performance criteria. For each quality criterion, factors and metrics are now being defined. This quality framework is being formally expressed using the OWL-DL ontology language, which will enable an integrated and consistent quality view to be generated using ontology reasoning techniques. An overall data integration architecture has been defined, incorporating several existing and new tools and components, that embeds these quality assurance capabilities throughout the DI life-cycle.


1.††† J. Wang, A Quality Framework for Data Integration. To appear in
Proc. British National Conference on Databases, June 2010.
2.††† L. Zamboulis, A. Poulovassilis, J. Wang, Ontology-Assisted Data Transformation and Integration.† Proc. Ontologies-based Techniques for DataBases in Information Systems and Knowledge Systems (ODBIS'08), August 2008, pp 29-36,.