Conserved Domains and Protein Classification
 
 
 
What's New
 

To receive e-mail news about changes to the Conserved Domain Database and its associated resources, subscribe to the cdd-announce@ncbi.nlm.nih.gov mailing list by completing a brief form or sending an e-mail message with the word subscribe in the subject line to cdd-announce-request@ncbi.nlm.nih.gov.

 

CDD v3.19

[8 MAR 2021]  A new version of the Conserved Domain Database has been released. Version 3.19 contains 3,148 new or updated NCBI-curated domains and now mirrors Pfam version 33.1 as well as models from the NCBIfam collection. Fine-grained classifications of the immunoglobulin, RRM, cytochrome P450, 7-transmembrane GPCRs, KH, calponin homology and C1 domain superfamilies have also been added. With this release, CDD introduces model-specific word-score thresholds for the RPS-BLAST heuristics. These are included in the position-specific score matrices (PSSMs) and are used by database formatting software when constructing word lookup tables for the BLAST heuristics stage. The current implementation results in a 3-fold speedup of RPS-BLAST searches and misses annotation for about 0.6% of query proteins, mostly at the borderline of significance. You can access CDD at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/cdd and find updated content on the CDD ftp site at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd. Click on the database statistics at the right to retrieve the subset of records from any source database.

 

New viral protein domain models for annotation of coronaviruses

[30 APR 2020]  NLM's Conserved Domain Database (CDD) has expanded its scope to now include 153 new viral protein domain family models for the annotation of coronaviruses, including models such as for the S1 subunit of coronavirus Spike proteins (cd21527), the nucleocapsid (N) protein of coronavirus (cd21595), and the coronavirus RNA-dependent RNA polymerase (cd21530). Each curated domain model consists of a multiple sequence alignment illustrating conserved amino acids and may define conserved sequence features that have been confirmed experimentally, plus links to relevant publications. When available, the domain models include 3D structures with links to interactive 3D views and interacting partners.

A tabular summary of SARS-CoV-2 gene products along with links to matching conserved domain models and representative 3D protein structures is available here.

 

CDD v3.18

[26 MAR 2020]  A new version of the Conserved Domain Database has been released. Version 3.18 contains 2,128 new or updated NCBI-curated domains and now mirrors Pfam version 32 as well as models from the NCBIfam collection. Fine-grained classifications of the cupin and PBP1 superfamilies have also been added. You can access CDD at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/cdd and find updated content on the CDD ftp site at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd. Click on the database statistics at the right to retrieve the subset of records from any source database.

 

Print version now available

[08 JAN 2020]  The print version of "CDD/SPARCLE: the conserved domain database in 2020"  by Lu S et al. is now available in the Nucleic Acids Research database issue (PubMed ID 31777944; full text at Oxford Academic; full text in PubMed Central).

 

CDD/SPARCLE: the conserved domain database in 2020

[28 NOV 2019]  An article by Lu S et al., about the NCBI Conserved Domain Database became available in Nucleic Acids Research as an e-publication ahead of print (PubMed ID 31777944): "As NLM's Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research..." (read more...)

 
 
 
Database Statistics
 
CDD v3.19, as of 25 February 2021:

 
62,852 total models from all Source Databases
17,937 models from NCBI CDD curation effort
772 models from NCBIfams
1,011 models from SMART v6.0
18,271 models from PFAM v32
4,871 models from COGs v1.0
11,657 models from Entrez Protein Clusters
4,488 models from TIGRFAM v15
organized into 4,617 multi-model Superfamilies
 
 
Click on the numbers above to retrieve the domain records from CDD; click on the source database names for additional details.
 

 
News Archive
 
 

CDD v3.17

[03 APR 2019]  A new version of the Conserved Domain Database has been released. Version 3.17 contains 3,272 new or updated NCBI-curated domains and now mirrors Pfam version 31 as well as models from NCBIfams, a collection of protein family hidden Markov models (HMMs) for improving bacterial genome annotation. A fine-grained classification of the major facilitator superfamily has also been added. You can access CDD at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/cdd and find updated content on the CDD ftp site at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd.

 

CDD v3.16

[29 MAR 2017]  A new version of the Conserved Domain Database has been released. Version 3.16 contains 1,659 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and now mirrors Pfam version 30. A fine-grained classification of the 7-membrane GPCR transmembrane subunit has been added. In addition, the default database size parameters for CD-Search have been adjusted, resulting in slightly higher E-values. Fewer models are now assigned a multi-domain-model status, affecting the domain annotation of a large number of proteins. You can access CDD at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/cdd and find updated content on the CDD ftp site at ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd.

 

SPARCLE: Protein Classification

[12 OCT 2016]  The Subfamily Protein Architecture Labeling Engine (SPARCLE) is now available. It is a resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic domain architecture. To use SPARCLE, you can either: (1) enter a query protein sequence into CD-Search, which will display a "Protein Classification" on the results page if the query protein has a hit to a curated domain architecture in the SPARCLE database, or (2) search the SPARCLE database by keyword to retrieve domain architectures that contain the term(s) of interest in their descriptions. With either approach, the corresponding SPARCLE record(s) will display the name and functional label of the architecture, supporting evidence, and links to other proteins with the same architecture. Additional information and illustrated examples are provided on the "About SPARCLE" page.

 

CDD v3.15

[27 JUN 2016]  A new version of the Conserved Domain Database has been released. Version 3.15 contains 290 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and now mirrors Pfam version 28. A fine-grained classification of the beta lactamase-like metallohydrolases has been added. In addition, the default sort order of conserved domain hits in CD Search has been changed, ranking hits by E-value without giving preference to NCBI-curated models.

 

Updated version of the "rpsbproc" utility is now available

[29 JUN 2015]  An updated version of the "rpsbproc" command line utility for RPS-BLAST is now available from the CDD FTP site: ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/. The output generated by the updated version includes a non-redundant list of structural motifs (accession prefix "sd"), eliminating overlapping structural motifs. Additional information about the "rpsbproc" command line utility is provided in the December 4, 2014 announcement of its initial release.

 

CDD v3.14

[28 MAY 2015]  A new version of the Conserved Domain Database has been released. Version 3.14 contains 560 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and contains corrections to some short names for TIGRFAM records as well as updated names and classifications for many models derived from COGs. A fine-grained classification of the Myosin motor domains has been added.

 

Improved consistency of domain annotation

[20 APR 2015]  The CD-Search service now offers two new options that are designed to improve the consistency of domain annotation, based on known domain architectures. The option to "Rescue Borderline Hits" allows you to see hits that have an E-value above the RPS-BLAST reporting threshold (anywhere between 0.01 and 1.0), and that are consistent with known domain architectures (illustrated example). The option to "Suppress Weak Overlapping Hits" suppresses hits that have an E-value close to the RPS-BLAST reporting threshold (in between 0.01 and 0.001) but overlap with stronger hits (illustrated example). Additional details are provided in a publication by Derbyshire et al., 2015.

 

CDD v3.13

[09 JAN 2015]  A new version of the Conserved Domain Database has been released. Version 3.13 contains 286 new or updated NCBI-curated domains, including models specifically built to annotate structural motifs (accession prefix "sd"), and now mirrors TIGRFAMs version 15.

 

New post-processing utility is now available for RPS-BLAST

[04 DEC 2014]  A new "rpsbproc" command line utility is now available, as an addition to the standalone version of Reverse Position-Specific BLAST (RPS-BLAST).

Standalone RPS-BLAST ("rpsblast") continues to be packaged with the BLAST executables ftp://ftp.ncbi.nih.gov/blast/executables/LATEST/, as it has been since 2000. It lists the conserved domain models that score above a certain threshold (default set to an evalue of 10), sorted by scores, on each of your query protein sequences.

The new "rpsbproc" utility is available from the CDD FTP site: ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/. It post-processes the results of local RPS-BLAST searches in order to provide a non-redundant view of the conserved domains found in your protein query sequences, and to provide additional annotation on query sequences, such as domain superfamilies and conserved sites, similar to the annotation provided by the corresponding web services (e.g., the NCBI Batch CD-Search web service at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/Structure/bwrpsb/bwrpsb.cgi). The README file provides additional details about the new "rpsbproc" utility.

 

CDD v3.12

[03 OCT 2014]  A new version of the Conserved Domain Database has been released. Version 3.12 contains 1526 new or updated NCBI-curated domains.

 

CDD v3.11

[15 FEB 2014]  A new version of the Conserved Domain Database (CDD) has been released. Version 3.11 contains 596 new or updated NCBI-curated domain models and now contains the most recent Pfam release 27. Also, position-specific scoring matrices (PSSMs) have been re-computed for many models in CDD, and frequency tables have been added to the PSSMs. The search databases distributed as part of this release can now be used with the composition-based scoring that is now available in the more recent versions of RPS-BLAST (version 2.2.28 and up). The new search databases also remain compatible with previous versions of RPS-BLAST.

 

CD-Search, Batch CD-Search, and CDART display style revised

[12 FEB 2014]  The display style for drawing domain models in CD-Search, Batch CD-Search, and CDART has been revised. The display style is now uniform among those tools, with a given domain model rendered in the same shape and color by all three tools. Additionally, a new display option, "Standard Results," is available in CD-Search and Batch CD-Search, and shows the top-scoring domain model from each source database. The Batch CD-Search graphical display of search results also offers a new "compact mode," which displays the domain architecture of each query sequence on a single line. This display type is particularly useful if you select two or more query sequences from the list and want to compare their domain architectures. All three tools (CD-Search, Batch CD-Search, and CDART) employ the latest version of RPS-BLAST, which, as of version 2.2.28, uses composition-based scoring and abolishes the need to mask out compositionally biased regions in query sequences. Live and pre-computed searches generated by the CD-Search web tool now use these settings, and as a result the domain annotations have changed for a number of protein sequences in Entrez.

 

CDD v3.10

[21 MAR 2013]  A new version of the Conserved Domain Database (CDD) has been released. Version 3.10 contains 1104 new or updated NCBI-curated domain models. Also, position-specific score matrices have been re-computed for a large fraction of the models in CDD, which has slightly affected the resulting sequence annotations. PSSMs are now provided in an extended format. They contain 28 rows instead of 26, and also come with intermediate data in addition to the final scoring matrix. The latter will make it possible to directly generate search databases for the current version of RPS-BLAST, DELTA-BLAST, as well as an upcoming new version of RPS-BLAST that supports composition-corrected scoring.

 

CD-Search "specific hits" now include domain models from external sources

[06 AUG 2012]  A specific hit is a high confidence association between a protein query sequence and a conserved domain, resulting in a high confidence level for the inferred function of the protein query sequence. The algorithm for identifying specific hits has been revised to include domain models from external sources. Previously, specific hits were limited to NCBI-curated domains. Now, if domain models from both the NCBI-curated data set and external sources meet a domain-specific threshold, the NCBI-curated domain will still be listed preferentially as the specific hit because it has been annotated with fine-grained evolutionary relationships, conserved sequence blocks, specific functions, and conserved features/sites based on careful review of sequence data, 3D structures, and literature. However, if no NCBI-curated domain meets the criteria for a specific hit, then the top-ranked domain model from an external source will be shown in the CD-Search results concise display if it meets all the criteria for a specific hit. As a result of this change, more sequences in the Entrez Protein database are now annotated with specific functional information.

 

CDD v3.09

[01 NOV 2012]  The CDD v3.09 release includes 42 new or updated NCBI-curated domains and now mirrors TIGRFAM v13.

 

CDD v3.08

[17 SEP 2012]  The CDD v3.08 release includes 239 new or updated NCBI-curated domains.

 

CDD v3.07

[06 AUG 2012]  The CDD v3.07 release includes 495 new or updated NCBI-curated domains.

 

CDD v3.06

[29 MAY 2012]  The CDD v3.06 release includes 310 new or updated NCBI-curated domains.

 

Conserved Domain searches now launched for blastx queries

[28 MAR 2012]  Conserved Domain searches are now being launched for all nucleotide queries shorter than 10,000 base pairs submitted to blastx, the BLAST program that translates a nucleotide query sequence in six reading frames and compares each translation against the protein data set. The blastx search results page includes a concise display of the conserved domains found on the translated reading frames, and that graphic links to the corresponding interactive view in the CD-Search tool.

 

CDD v3.05

[23 MAR 2012]  The CDD v3.05 release includes 161 new or updated NCBI-curated domain models and now mirrors TIGRFAM v12.

 

CDD v3.04

[08 MAR 2012]  The CDD v3.04 release includes 166 new or updated NCBI-curated domain models..

 

CDD v3.03

[19 JAN 2012]  The CDD v3.03 release includes 174 new or updated NCBI-curated domain models and now mirrors PFAM v26.

 

CDD v3.02

[07 DEC 2011]  The CDD v3.02 release includes 170 new or updated NCBI-curated domain models and now mirrors TIGRFAM v11.

 

CDD v3.01

[09 NOV 2011]  The CDD v3.01 release includes 298 new or updated NCBI-curated domains.

 

CD-Search now accepts nucleotide sequences as queries

[04 NOV 2011]  The CD-Search tool now accepts nucleotide sequences as queries. It translates them in all six reading frames and searches each protein product against the RPS-BLAST databases. CD-Search will combine the results for all the proteins into a single page, but will only display the translated reading frames that picked up a match in CDD. The help document provides additional details.

 

CDD v3.00

[28 Oct 2011]  The CDD v3.00 release contains the same conserved domain models as the previous release (CDD v2.32). However, the composition of some superfamily clusters have changed, and the single domain/multidomain status assigned to some conserved domain models has also changed. These changes are due to a slightly revised E-value calculation implemented in the BLAST program suite (including RPS-BLAST), which now uses a new variant of the Finite Size Correction that produces more accurate E-values for short query and/or subject sequences.

 

CDTree now includes the latest version of Cn3D

[18 OCT 2011]  A new CDTree software bundle is now available and includes the most recent version of NCBI's 3D structure viewing program, Cn3D 4.3. CDTree is:

  • a powerful tool to aid in the classification of protein sequences and investigate their evolutionary relationships
  • a web-based helper application used by the CDD on-line search service to permit user interaction with pre-defined protein domain hierarchies
  • an integrated software environment organized to help users assimilate large amounts of biological data from various resources by access to a suite of analysis methods
  • an alignment editor and 3D structure visualization program through its integration with Cn3D 4.3

 

New version of CDART offers new functions & features

[05 OCT 2011]  A new release of the Conserved Domain Architecture Retrieval Tool (CDART) is available and offers new functions and features for finding proteins that have domain architectures similar to a query protein (illustrated example):

CDD v2.32

[02 SEP 2011]  The CDD v2.32 release includes 14 new or updated NCBI-curated domains, suppresses pfam10695, and fixes errors in superfamily clustering.

 

Entrez CDD interface redesign

[25 JULY 2011]  The Conserved Domain Database now has a revised home page, search interface, and search results display, to have functions similar to those available in PubMed. Changes include: (a) a streamlined home page with links to related resources; (b) an "Advanced Search" page, which provides the ability to build a query one term at a time, browse the index of any search field, and combine earlier searches; and (c) new search results displays that provide links in the right margin to search filters, related data, and tools.

 

CDD v2.31

[19 AUG 2011]  The CDD v2.31 release includes 292 new or updated NCBI-curated domains and now mirrors SMART v6.0.

 

CDD v2.30

[16 JUN 2011]  The CDD v2.30 release includes 201 new or updated NCBI-curated domains.

 

CDD v2.29

[13 MAY 2011]  The CDD v2.29 release includes 256 new or updated NCBI-curated domains, and now mirrors TIGRFAM v10.1 and Pfam v25.

 

CDD v2.28

[30 MAR 2011]  The CDD v2.28 release includes 611 new or updated NCBI-curated domains and now mirrors TIGRFAM v10.

 

CDD v2.27

[02 MAR 2011]  The CDD v2.27 release includes 357 new or updated NCBI-curated domains.

 

CDD v2.26

[29 DEC 2010]  The CDD v2.26 release includes 124 new or updated NCBI-curated domains as well as the most recent data from PRK.

 

CDD v2.25

[07 OCT 2010]  The CDD v2.25 release includes 88 new or updated NCBI-curated domains.

 

Batch CD-Search

[30 SEP 2010]   A Batch CD-Search tool is now available for the computation and download of conserved domain annotation on large sets of protein queries. Input up to 100,000 protein query sequences as a list of sequence identifiers and/or raw sequence data, then download output in a variety of formats (including tab-delimited text files) or view the search results graphically. See the help document for additional details, including information on using Batch CD-Search for scripted data downloads.

 

CDD v2.24

[09 SEP 2010]  The CDD v2.24 release includes 196 new or updated NCBI-curated domains.

 

CDD v2.23

[29 JUL 2010]  The CDD v2.23 release includes 174 new or updated NCBI-curated domains.

 

CDD v2.22

[26 MAY 2010]  The CDD v2.22 release includes 443 new or updated NCBI-curated domains.

 

CDD v2.21

[13 APR 2010]  The CDD v2.21 release includes 489 new or updated NCBI-curated domains as well as the most recent data from PRK, which now includes domain models for plant-specific (non-chloropast) proteins, indicated by PLN accession number prefixes.

 

CDD v2.20

[19 MAR 2010]  The CDD v2.20 release includes 107 new or updated NCBI-curated domains, and now mirrors TIGRFAM v9.0.

 

CDD v2.19

[01 FEB 2010]  The CDD v2.19 release includes PFAM v24.0 as well as 532 new or updated NCBI-curated domains.

 

CDD v2.18

[10 DEC 2009]  The CDD v2.18 release includes 489 new or updated NCBI-curated domains.

 

CDD v2.17

[04 JUN 2009]  The CDD v2.17 release includes 484 new or updated NCBI-curated domains, as well as records from a new data source, TIGRFAM, and protozoan domains from the Protein Clusters (PRK) database.

 

Specific Hits

[08 MAY 2008]  The CD-Search tool now shows four types of hits in search results, including specific hits. A specific hit is a high confidence association between a protein query sequence and a conserved domain, resulting in a high confidence level for the inferred function of the protein query sequence. more...

 

Superfamilies

[08 MAY 2008]  The Conserved Domain Database (CDD) is now organized into superfamilies. A superfamily cluster is a set of conserved domain models that generate overlapping annotation on the same protein sequences. These models are assumed to represent evolutionarily related domains and may be redundant with each other. more...

 
 
 
 
 Revised 8 March 2021