Automated Protein Structure Prediction Using Templates from the CATH Protein Family Database


Funding and Staffing Details

This project has been funded by a 3-year BBSRC/EPSRC grant under the Bioinformatics Initiative. The project is a collaboration between Christine Orengo of the Biomolecular Structure & Modelling Group in the Department of Biochemistry & Molecular Biology at UCL, and Nigel Martin and Roger Johnson of the Database and Web Technologies and Bioinformatics Groups in the School of Computer Science and Information Systems at Birkbeck. Adrian Shepherd was a full-time research assistant working on the project until February 2002.


Project Aims

The project has developed techniques enabling the integration of complementary data on evolutionary, structural and functional relationships with sequence data in order to support genome analysis, and to derive consensus sequence/structure templates for protein structural families. Our approach has been to explicitly store metadata on these relationships beneath materialized views so enabling a design which is flexible enough to readily accommodate new sources of derived structural and functional data.
Based on this approach, we have designed and implemented the Protein Family Database (PFDB) in Oracle 8i which manages data on behalf of the CATH, VIDA and Gene3D databases. PFDB integrates and validates data from a number of primary data sources including the Protein Data Bank, SwissProt and Genbank. A preliminary demonstration query interface has been implemented.


Project Publications

VIDA: a virus database system for the organisation of virus genome open reading frames, M.M.Alba, D.Lee, F.M.G.Pearl, A.J.Shepherd, N.J.Martin, C.A.Orengo and P.Kellam, Nucleic Acids Res, 29(1), 133-136, (2001).

A rapid classification protocol for the CATH domain database to support structure genomics, F.M.G.Pearl, N.J.Martin, J.E.Bray, D.W.A.Buchan, A.P.Harrison, D.Lee, G.A.Reeves, A.J.Shepherd, I.Sillitoe, A.E.Todd, J.M.Thornton and C.A.Orengo, Nucleic Acids Res, 29(1), 223-227, (2001).

Investigation of Methods for Representing Protein Sequence Data Using an Oracle Data Cartridge. L. Huang. MSc Project Report, School of Computer Science and Information Systems, Birkbeck College, (2001).

Implementing Path Queries on Graph Views of Relational Data. R Hamill and N Martin, Technical Report BBKCS-02-08 , School of Computer Science and Information Systems, Birkbeck College, (2002).

PFDB: A Generic Protein Family Database Integrating the CATH Domain Structure Database with Sequence Based Protein Family Resources. A. Shepherd, N.J. Martin, R.G. Johnson, P.Kellam and C.Orengo, Bioinformatics, 18, 1666-1672, (2002).

The CATH database: an extended protein family resource for structural and functional genomics, F. M. G. Pearl, C. F. Bennett, J. E. Bray, A. P. Harrison, N. Martin, A. Shepherd, I. Sillitoe, J. Thornton, C. A. Orengo, Nucleic Acids Res, 31(1), 452-455, (2003).