CDD: a conserved domain database for interactive domain family analysis

Aron Marchler-Bauer; John B Anderson; Myra K Derbyshire; Carol DeWeese-Scott; Noreen R Gonzales; Marc Gwadz; Luning Hao; Siqian He; David I Hurwitz; John D Jackson; Zhaoxi Ke; Dmitri Krylov; Christopher J Lanczycki; Cynthia A Liebert; Chunlei Liu; Fu Lu; Shennan Lu; Gabriele H Marchler; Mikhail Mullokandov; James S Song; Narmada Thanki; Roxanne A Yamashita; Jodie J Yin; Dachuan Zhang; Stephen H Bryant

doi:10.1093/nar/gkl951

CDD: a conserved domain database for interactive domain family analysis

Nucleic Acids Res. 2007 Jan;35(Database issue):D237-40. doi: 10.1093/nar/gkl951. Epub 2006 Nov 29.

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Building 38 A, Room 8N805, 8600 Rockville Pike, Bethesda, MD 20894, USA. bauer@ncbi.nlm.nih.gov

Abstract

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.nlm.nih.gov/Entrez and will search CDD and many other databases. Domain annotation for proteins in Entrez has been pre-computed and is readily available in the form of 'Conserved Domain' links. Novel protein sequences can be scanned against CDD using the CD-Search service; this service searches databases of CDD-derived profile models with protein sequence queries using BLAST heuristics, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Protein query sequences submitted to NCBI's protein BLAST search service are scanned for conserved domain signatures by default. The CDD collection contains models imported from Pfam, SMART and COG, as well as domain models curated at NCBI. NCBI curated models are organized into hierarchies of domains related by common descent. Here we report on the status of the curation effort and present a novel helper application, CDTree, which enables users of the CDD resource to examine curated hierarchies. More importantly, CDD and CDTree used in concert, serve as a powerful tool in protein classification, as they allow users to analyze protein sequences in the context of domain family hierarchies.

Publication types

Research Support, N.I.H., Intramural

MeSH terms

Amino Acid Sequence
Animals
Conserved Sequence
Databases, Protein*
Internet
Phylogeny
Protein Structure, Tertiary* / genetics
Proteins / classification
Sequence Analysis, Protein
User-Computer Interface

Substances

Proteins

Grants and funding

Intramural NIH HHS/United States