U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

The NCBI Genome Remapping Service (Remap) will be retired in November 2023. Read more.

NCBI Remap API Documentation

Back to NCBI Remap Page

ATTENTION API USERS

A recent update to the NCBI Remapping service rendered one of the data structures incompatible with the API, causing remap_api.pl to fail. We have updated the API to address this. If you downloaded the API prior to 4/21/15, you will need to download the latest version of remap_api.pl before doing your remapping. We apologize for any inconvenience this may have caused. - The NCBI Remap team

The API for NCBI Remap is a perl script that can be found on our FTP site:

ftp://ftp.ncbi.nlm.nih.gov/pub/remap/

remap_api.pl

Once you download the script, you will need to modify the Shebang line to point to your installation of Perl. Alternatively, you can leave the script unaltered and run 'Perl remap_api.pl'. 

Use of this API requires the following perl modules:

  • Getopt::Long
  • LWP::UserAgent
  • HTTP::Request::Common qw(POST), qw(GET)
  • HTTP::Headers
  • XML::XPath
  • XML::XPath::XMLParser
  • JSON

To use the API, you first need to determine what alignments are available to the service. To do this run:

./remap_api.pl --mode batches

This will return a list of available alignments; sample output is shown below:

Information in Alignment Batches
batch_id query_species query_name query_ucsc query_acc target_species target_name target_ucsc target_acc alignment_date
3851 Danio rerio Zv8   GCF_000002035.3 Danio rerio Zv9 danRer7 GCF_000002035.4 09/04/2011
5591 Homo sapiens HuRef   GCF_000002125.1 Homo sapiens GRCh37.p5   GCF_000001405.17 01/15/2012
5621 Homo sapiens NCBI36 hg18 GCF_000001405.12 Homo sapiens GRCh37.p5   GCF_000001405.17 01/16/2012
0 Homo sapiens GRCh37 hg19 GCF_000001405.13 Homo sapiens RefSeqGene   RefSeqGene  

The first column is the batch id. If there is a non-zero number in this column the pair can be used for normal remap (assembly-assembly). If the ID is 0, the alignments in that batch are only to be used for Clinical Remap. 

To remap a set of features from one assembly to another you need to identify the assembly accession.versions that represent those assemblies; for more information on assembly accessions, check out the assembly help pages. The assembly accession.versions can be found in the columns labeled 'query_acc' and 'target_acc'. You can use assemblies in the query_acc in either the --from or --dest parameters (see below). The same is true of the target_acc parameter. That is, you can remap from NCBI36 (hg18) to GRCh37 (hg19) or from GRCh37 (hg19) to NCBI36 (hg18).

Parameters used for running Remap:

  • Provide a name for the file containing the Remap report. If you don't provide this, a default name will be used.
  • Provide a name for the the output file containing the Remap report. If you don't provide this a default name will be used.

For best results, please limit your submissions to input files of ~250,000 rows and at most 4 simultaneous submissions.

Remap Parameters
flag Options Comment Optional/Required
--mode

asm-asm
asm-rsg
rsg-asm
alt-loci
batches

Specifies the remap mode:
asm-asm: Assembly-Assembly
asm-rsg: Clinical Remap (from assembly ->RefSeqGene)
rsg-asm: Clinical Remap (from RefSeqGene->assembly)
alt-loci: Alt loci Remap (between a primary assembly and its related alt-loci)
batches: returns list of alignments available to remap service
Required
--from assembly accession.version Assembly accession.version for the assembly you want to map from. You can find valid assembly accession.version numbers by running the 'batches' command. Required
--dest assembly accession.version Assembly accession.version for the assembly you want to map to.
Note: Irrelevant for --mode alt-loci
Required
--annotation file name The name of the file containing your annotation. Required
--annot_out file name Provide a name for the the output file containing the remapped annotation. If you don't provide this a default name will be used.  Optional
--report_out file name Provide a name for the the report file containing the remapped annotation. If you don't provide this a default name will be used. Optional
--gbench_out file name Provide a name for the file containing the genome workbench project. If you do not provide this parameter, no genome workbench file will be produced. Optional
--allowdupes on
off
Allow multiple locations to be returned (on) or not (off). Default is 'on'.  Optional
--merge on
off
Allow alignments to merge across alignment gaps (on) or not (off). Default is 'on'.  Optional
--mincov number between 0.1 and 1 Specify the minimal coverage a feature must have to be remapped. Default is 0.5. Optional
--maxexp a number Specify the maxmimum amount a feature is allowed to 'expand' upon remapping. Default is 2.  Optional
--in_format

guess (default)
hgvs
bed
gvf
gff
gtf
gff3
vcf
asnt
asnb
region

Specify the format of the input file. 'guess' is default.  Optional
--out_format hgvs
bed
gvf
gff
gtf
gff3
vcf
asnt
asnb
region
Force the format of the output file to a specific type. Default is to match the format of the input type. Note: if you choose to output a file in a format that is different from the input type you may lose meta-data.  Optional

Examples

Remap a file of annotations named 'my_annotes.gff' from NCBI36 to GRCh37.p5:

./remap_api.pl --mode asm-asm --from GCF_000001405.12 --dest GCF_000001405.17 --annotation my_annotes.gff --annot_out my_annotes.GRCh37.p5.gff --report_out my_annotes_NCBI36_GRCh37.p5.tsv --gbench_out my_annotes_GRCh37.p5.gbp

Use the clinical remap service to map a file of annotations called 'my_annotes.bed' from GRCh37 to the RefSeqGene set: 

./remap_api.pl --mode asm-rsg --from GCF_000001405.13 --dest RefSeqGene --annotation my_annotes.bed