Names of clinical features, conditions, genes, proteins, and variants used in ClinVar, GTR, and MedGen

Return to GTR - Return to MedGen

Over the years, a condition or gene may have been described by a variety of terms. For disorders with a genetic basis, drug responses, and other conditions that are of interest in medical genetics, the Genetic Testing Registry (GTR) and ClinVar aggregate comparable terms, assign them a stable identifier, and select one term from each set as preferred. These terms are then integrated with the subset of terms in the UMLS that are provided without restriction, and with other ontologies, to generate terms to support MedGen. All resources support searching on both preferred and alternate designations, and their displays include alternates. The full XML extract of ClinVar includes the sources of those alternates. GTR and ClinVar share the database infrastructure for curating names of genes, phenotypes, and proteins.

Standard terms for other categories of data are documented as Authorities used in ClinVar.

Diseases and other phenotypes

Integration of names

Names are integrated from multiple sources. These include:

  1. UMLS (as the source for names in A.D.A.M., ICD9, ICD10, MeSH, SNOMED CT®, OMIM®)
  2. OMIM® (to supplement what may not be incorporated in the current release of UMLS)
  3. Office of Rare Diseases
  4. GeneReviews®
  5. Genetics Home Reference
  6. Human Phenotype Ontology (HPO)
  7. Monarch Disease Ontology (MONDO)
  8. Review papers
  9. Disorder-specific stakeholders

When available, ClinVar, GTR and MedGen use the preferred term from SNOMED CT®, because these names are now provided without licensing restrictions. Most sources of disease names provide a preferred term and some alternates. The category of the alternate names is not always clear from an informatics perspective, so at present we have been very cautious about grouping any name as an alternate to another if there are any conflicts about relationships within the sources we use. We use the disease concept defined by UMLS as our default, and are actively collaborating with the UMLS team if questions arise.

Descriptions and definitions of disorders

GTR and MedGen display definitions or descriptions of disorders from multiple sources. These sources include GeneReviews®, OMIM®, the Clinical Pharmacogenetics Implementation Consortium (CPIC), Medical Genetic Summaries, and others. GTR and MedGen provide attribution for each source, with links to the specific record providing the description when available. Read the list of Sources of definitions for MedGen records.

Names of genes and proteins

The authority for symbols and full names for human genes is the HUGO Gene Nomenclature Committee (HGNC). If an official symbol has not yet been assigned, the preferred name in NCBI's Gene database will be used in ClinVar and GTR, and an official name will be requested.

Proteins are named based on Swiss-Prot, in accordance with the NCBI RefSeq practice.

Representation of variants

Simple variants

A single variant in the genome may be represented by multiple HGVS expressions, depending on the sequence (represented by accession and version) used for reference. When ClinVar receives submissions using one of the possible HGVS expressions, it first validates that representation relative to the reference (i.e. if the HGVS expression is ACCESSION.version:c.12345G>T, is there a G at position 12345 numbering from the first nucleotide of the coding region annotated on ACCESSION.version?). ClinVar then aligns that sequence to the reference assembly (currently GRCh37 and GRCh38; see the GRC site for more details about reference assemblies), to calculate all the HGVS expressions that might also be used to describe the variant, based on aligments of RefSeq transcripts and RefSeqGenes.

Based on this processing, each variant is likely to be represented by more than one HGVS expression. In the absence of a community standard for selecting one preferred 'name' for a variant, ClinVar generates a preferred name for each variant based on the following choices:

  • Official nomenclature
    • If there is an official name for an variant, ClinVar will use that. There are not many of these.
  • HGVS representation of the variant
    • The selection of the RefSeq to use to define the sequence depends on
      • If a MANE transcript is available, use it
      • If only one submitter, use what the submitter provided
      • If more than one submitter, and more than one splice variant, give precedence to the representation within an exon
      • If in a coding region, and the molecular consequence is a missense, nonsense, or frameshift, report the change as the c. HGVS expression, including the official gene symbol, with predicted protein change in parentheses afterwards.
      • If there is more than one splice variant, and multiple numbering systems for coding changes based on different splice variants, and no submissions to guide selection of the RefSeq, use the RefSeq standard transcript defined by a RefSeqGene sequence.
      • If there is more than one RefSeq standard defined by a RefSeqGene sequence, and an LRG has been defined, use the RefSeq sequence that corresponds to the t1 transcript.
      • If the variant lies in multiple genes, use the gene and corresponding RefSeqs identified by the submitter.

You can download a list of the HGVS expressions ClinVar calculates for each VariationID and AlleleID here: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variation_allele.txt.gz .

More information is available about HGVS Expressions in ClinVar.

Structural variants

The pattern is assembly cytogenetic location(sequence location) xcopy number, e.g.

GRCh37/hg19 16p11.2(chr16:154974667-155226096)x3

  • Copy number is computed relative to normal, so if male, and X, copy number gain could be 2x.

Complex variants

Under discussion.

ClinVar, GTR, Gene and MedGen

ClinVar, the Genetic Testing Registry (GTR), Gene and MedGen use the same preferred term for the same concept.

Support Center

Last updated: 2020-04-23T22:08:54Z