GenBank Database Divisions

GenBank divisions are divided into two general categories and were described in an (Genome Research (1997) 7(10)) article by Ouellette and Boguski; the full-text article is available (Database Divisions and Homology Search Files: A Guide for the Perplexed). The "Organismal" category includes databases pertaining to sequences derived from specific organisms and the "Functional" databases pertain to different types of sequence data being collected. Sequence records exist only in one GenBank division. For example, the HTG division includes unfinished sequences (phases 0, 1, and 2) being generated from several different organisms. As a sequence is updated to phase 3, it is moved into the appropriate organismal division. For instance, human phase 3 (finished) HTG sequences are located in the PRI division. The GenBank divisions listed here represent the location of the annotated sequence records; for homology search purposes the records are reformatted and stored in the BLAST databases. The different database divisions currently available, as well as the related BLAST database, are listed below. An example of a submission (one accession number) that has progressed through phase 1, phase 2, and phase 3 is available (Examples).

Organismal Divisions

Database Division BLAST Example
BCT Bacterial sequences nr, month
PRI Primate sequences nr, month Human Phase 3
ROD Rodent sequences nr, month
MAM Other mammalian sequences nr, month
VRT Other vertebrate sequences nr, month
INV Invertebrate sequences nr, month Drosophila, C. elegans Phase 3
PLN Plant and Fungal sequences nr, month Arabidopsis Phase 3
VRL Viral sequences nr, month
PHG Phage sequences nr, month
RNA Structural RNA sequences nr, month
SYN Synthetic and chimeric sequences nr, month
UNA Unannotated sequences nr, month

Functional Divisions

Database Division BLAST Example
EST Expressed Sequence Tags dbest, month
STS Sequence Tagged Sites dbsts, month
GSS Genome Survey Sequences dbgss, month
HTG High Throughput Genomic sequences htgs, month All Organisms: Phase 0, 1, and 2

  • Phase 0 sequences are single-few pass reads of a single clone (not contigs usually).
  • Phase 1 sequences are unfinished, unordered, and contain gaps.
  • Phase 2 sequences are unfinished, ordered, and can contain one or more gaps.
  • Phase 3 sequences are high quality finished sequences that do not contain gaps.
