Project Directors
Nigel Martin
Christine  Orengo

Rachel Hamill
Michael Maibaum
Alex Poulovassilis
Galia Rimon
Stathis Sideris

Project Details
The Wellcome Trust, duration 3 years

Human Genome Project, Data Warehouse, Data Integration

Project website


BioMap: Integration and analysis of biological data


The success of the human genome project is a major milestone in the battle against disease. Biologists now aim to describe where and when genes are expressed, and how they function in normal and diseased states. To this end the expression of thousands of genes is analyzed simultaneously using microarray experiments. These experiments generate vast quantities of data.

The aim of this project is to combine such experimental data with integrated genomic and protein data resources to understand how genes operate and to predict outcomes of events such as drug treatment and disease.


The project is using real data from collaborating scientists. Keeping track of this data is a major challenge. Meditor (a BioMap software application) captures laboratory information. We have developed techniques transform the captured data into a format suitable for searching. We are integrating this data into the BioMap data warehouse of existing protein structure, function and pathway data.

Connecting biological entities from many inter-related databases is a major challenge due to the inconsistent nomenclature, and the high frequency of errors. To correctly identify related entities in the BioMap warehouse we have exploited the link between gene sequence similarity and evolutionary relatedness to provide a basis for integration.

The next step will be development of sophisticated techniques required to search and analyze the data. Finally, to cope with this new level of complex data integration, innovative visualisation tools are in development.


Over the last ten years there has been a huge effort to sequence the genome of humans and many other organisms.


The goal now is to integrate genomic sequence data with protein structure, function and pathways data to provide a basis for understanding how whole organisms work.


Maibaum, M. Rimon, G. Orengo, C. Martin, N. and Poulovassilis, A.  BioMap: Gene Family based Integration of Heterogeneous Biological Databases Using AutoMed Metadata (2004) BIDM
Maibaum, M. Zamboulis, L. Rimon, G. Orengo, C. Martin, N. and Poulovassilis, A.  Cluster based integration of Heterogeneous Biological Databases using the AutoMed toolkit (2005) DILS

Last Updated ( Saturday, 05 May 2007 )