NCBI Eukaryotic Genome Annotation Policy On Which Genomes Are Annotated

Only genomes with assemblies that are public in INSDC (DDBJ, ENA or GenBank) are considered for inclusion in RefSeq and processing by the eukaryotic genome annotation pipeline. NCBI makes this selection based on several factors. These include:

  • NIH/NCBI priorities: Mammals are important to the NIH, so annotation of high-quality genome assemblies for new mammalian species is given a higher priority.
  • Assembly quality: Assemblies with higher contig and scaffold N50 are prioritized. We do not use strict N50 thresholds but generally prefer to work with assemblies that have a contig N50 > 50,000 bases and/or a scaffold N50 > 2,000,000 bases. NCBI may decide not to annotate assemblies that are extremely fragmented, even if they meet other criteria.
  • Community interest/requests (Request form)
  • Biological, evolutionary, or economic importance
  • Public availability of supporting transcript evidence:  Some annotation plans may be put on hold pending completed submission and public availability of generated transcriptome data.
  • Availability of gene annotation on the INSDC records: NCBI takes into consideration the availability of annotation submitted on the INSDC genome records and may elect to propagate that annotation onto the RefSeq records or may opt to generate RefSeq annotation via the eukaryotic annotation pipeline. These decisions are made based on taxonomic groups with some exceptions:
    • NCBI always generates annotation for mammalian organisms of interest to the RefSeq project in order to provide a more consistent dataset for organisms of interest to the NIH.
    • non-mammalian vertebrate and invertebrate genomes may be annotated as soon as they are available in INSDC.
    • NCBI annotation of plant genomes may be delayed by a year as there has historically been a lag between the submission of the assembly and the submission of annotation. NCBI may annotate the assembly earlier if the submitter indicates they do not intend to submit annotation or if the submitter requests annotation. If no annotation is available on INSDC records one year after their release, NCBI assumes that the submitters do not intend to submit any annotation and may annotate it.

Last updated: 2019-04-12T14:54:32Z