Automated Protein Structure Prediction Using Templates from the
CATH Protein Family Database
Funding and Staffing Details
This project has been funded by a 3-year BBSRC/EPSRC grant under
the Bioinformatics Initiative.
The project is a collaboration between Christine Orengo of the Biomolecular Structure &
Modelling Group in the Department of Biochemistry &
Molecular Biology at UCL, and Nigel Martin and Roger Johnson of the
Database and Web Technologies and
Bioinformatics Groups
in the School of Computer Science and Information Systems at Birkbeck.
Adrian Shepherd was a full-time
research assistant working on the project until February 2002.
Project Aims
The project has developed techniques enabling the
integration of complementary data on evolutionary, structural and functional relationships
with sequence data in order to support genome analysis, and to derive consensus
sequence/structure templates for protein structural families.
Our approach has been to explicitly store metadata on these relationships
beneath materialized views so enabling a design which is flexible enough
to readily accommodate new sources of derived structural and functional
data.
Based
on this approach, we have
designed and implemented the Protein Family Database (PFDB) in Oracle 8i which manages
data on behalf of the CATH, VIDA
and Gene3D databases.
PFDB integrates and validates data from a number of
primary data
sources including the Protein Data Bank, SwissProt and Genbank.
A preliminary demonstration
query interface has been implemented.
Project Publications
VIDA: a virus database system for the organisation of
virus genome open reading frames, M.M.Alba, D.Lee, F.M.G.Pearl, A.J.Shepherd,
N.J.Martin, C.A.Orengo and P.Kellam, Nucleic Acids Res, 29(1), 133-136, (2001).
A rapid classification protocol for the CATH domain
database to support structure genomics, F.M.G.Pearl, N.J.Martin, J.E.Bray, D.W.A.Buchan,
A.P.Harrison, D.Lee, G.A.Reeves, A.J.Shepherd, I.Sillitoe, A.E.Todd, J.M.Thornton and
C.A.Orengo, Nucleic Acids Res, 29(1), 223-227, (2001).
Investigation of Methods for Representing Protein Sequence Data Using an Oracle Data
Cartridge. L. Huang. MSc Project Report, School of Computer Science and
Information Systems, Birkbeck College, (2001).
Implementing Path Queries on Graph Views of Relational Data.
R Hamill and N Martin,
Technical Report BBKCS-02-08 , School of Computer Science and
Information Systems, Birkbeck College, (2002).
PFDB: A Generic Protein Family Database Integrating the CATH Domain Structure
Database with Sequence Based Protein Family Resources. A. Shepherd, N.J. Martin,
R.G. Johnson, P.Kellam and C.Orengo, Bioinformatics, 18, 1666-1672, (2002).
The CATH database: an extended protein family resource for structural and functional genomics,
F. M. G. Pearl, C. F. Bennett, J. E. Bray, A. P. Harrison, N. Martin, A. Shepherd, I. Sillitoe, J. Thornton, C. A. Orengo,
Nucleic Acids Res, 31(1), 452-455, (2003).