NCBI's Conserved Domain Database and Tools for Protein Domain Analysis

Curr Protoc Bioinformatics. 2020 Mar;69(1):e90. doi: 10.1002/cpbi.90.

Abstract

The Conserved Domain Database (CDD) is a freely available resource for the annotation of sequences with the locations of conserved protein domain footprints, as well as functional sites and motifs inferred from these footprints. It includes protein domain and protein family models curated in house by CDD staff, as well as imported from a variety of other sources. The latest CDD release (v3.17, April 2019) contains more than 57,000 domain models, of which almost 15,000 were curated by CDD staff. The CDD curation effort increases coverage and provides finer-grained classifications of common and widely distributed protein domain families, for which a wealth of functional and structural data have become available. The CDD maintains both live search capabilities and an archive of pre-computed domain annotations for a selected subset of sequences tracked by the NCBI's Entrez protein database. These can be retrieved or computed for a single sequence using CD-Search or in bulk using Batch CD-Search, or computed via standalone RPS-BLAST plus the rpsbproc software package. The CDD can be accessed via https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. The three protocols listed here describe how to perform a CD-Search (Basic Protocol 1), a Batch CD-Search (Basic Protocol 2), and a Standalone RPS-BLAST and rpsbproc (Basic Protocol 3). © 2019 The Authors. Basic Protocol 1: CD-search Basic Protocol 2: Batch CD-search Basic Protocol 3: Standalone RPS-BLAST and rpsbproc.

Keywords: Conserved Domain Database; domain architecture; protein annotation; protein classification; protein domains; protein function; protein naming.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods*
  • Conserved Sequence*
  • Databases, Protein*
  • Guidelines as Topic
  • Phylogeny
  • Protein Domains
  • Proteins / chemistry*

Substances

  • Proteins