Submitting HTG Sequences

The FTP-based HTG submission system has been decommissioned

The HTG submission system was created more than 20 years ago for fast processing of the BAC clones that were being sequenced for the Human Genome Project. As sequencing technologies have advanced over the years, the need for a specialized pipeline for depositing and updating BAC-based sequences has declined significantly. Furthermore, the center tracking that was done for HTG submissions is also no longer needed, since the likelihood of two groups sequencing the same clone has dropped substantially since the completion of clone-based genome sequencing efforts. In addition, NCBI has alternative submission systems for easy deposition of all types of genomic sequence data. Unfortunately, we are no longer able to maintain the older FTP-based HTG submission system and it has been decommissioned.

You will still be able to submit BAC-based HTG-like submissions to GenBank, but you should submit through the standard GenBank submission pathway for fasta submissions BankIt (at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/WebSub/) or by emailing an ASN file to gb-sub@ncbi.nlm.nih.gov. Similarly, updates to records that were originally submitted to the old HTG submission system will now be updated through the standard GenBank update route described at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/genbank/update/.

The HTG division contains unfinished DNA sequences generated by the high-throughput sequencing centers using traditional clone-based Sanger sequencing.

Draft genomes sequenced using non-clone based whole genome shotgun sequencing are not appropriate for HTG, these should be submitted as a WGS submission as described at www.ncbi.nlm.nih.gov/Genbank/wgs.html.

Submission Tools

There are two ways to create HTG records:

  1. BankIt https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/WebSub/ our general sequence submission tool will accept HTG submission with a few simple modifications. You should assemble all the contig sequences from a single BAC into a single sequence. If the sequence contains gaps these are marked by inserting a run of Ns into the sequence. You will also need to tell us whether the sequence is phase 1 or phase 2. Phase 3 sequences should be complete and contain no gaps. Gene and Coding region annotation is not required for phase 1 and optional for phase 2. For phase 3 we typically expect some annotation.
    • if the gap length is estimated, insert the equivalent number of nnns;
    • if the gap length is unknown, insert a string of 100 nnns to represent the gap;
    • annotate a misc_feature and include an appropriate note telling us what the Ns represent for example
      • /note="phase 1 HTG of BAC clone AGCD and runs of 100 Ns are gaps of unknown length"
      • /note="Ns represent a gap of estimated length, ## base pairs"
  2. The tbl2asn tool tbl2asn is a command-line program that reads in a FASTA sequence file (or an Ace Contig file with Phrap sequence quality values), a submission template file (to get contact and citation information), and a series of command-line arguments (to get additional information). tbl2asn then generates the ASN.1 file for submission. tbl2asn can be incorporated into scripts to facilitate expedient processing of records. Once you create your asn.1 file through tbl2asn you send it to us by email to gb-sub@ncbi.nlm.nih.gov
Support Center

Last updated: 2021-01-19T13:44:53Z