The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Nucleic Acids Res. 2007 Jan;35(Database issue):D291-7. doi: 10.1093/nar/gkl959. Epub 2006 Nov 29.

Abstract

We report the latest release (version 3.0) of the CATH protein domain database (http://www.cathdb.info). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto approximately 2 million sequences in completed genomes and UniProt.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Classification / methods
  • Databases, Protein*
  • Evolution, Molecular
  • Internet
  • Protein Folding
  • Protein Structure, Tertiary* / genetics
  • Proteins / classification
  • Sequence Homology, Amino Acid
  • Structural Homology, Protein
  • User-Computer Interface

Substances

  • Proteins