The NCBI Genome Remapping Service (Remap) will be retired in November 2023. Read more.
NCBI Remap API Documentation
Back to NCBI Remap PageATTENTION API USERS
A recent update to the NCBI Remapping service rendered one of the data structures incompatible with the API, causing remap_api.pl to fail. We have updated the API to address this. If you downloaded the API prior to 4/21/15, you will need to download the latest version of remap_api.pl before doing your remapping. We apologize for any inconvenience this may have caused. - The NCBI Remap team
The API for NCBI Remap is a perl script that can be found on our FTP site:
ftp://ftp.ncbi.nlm.nih.gov/pub/remap/
remap_api.pl
Once you download the script, you will need to modify the Shebang line to point to your installation of Perl. Alternatively, you can leave the script unaltered and run 'Perl remap_api.pl'.
Use of this API requires the following perl modules:
- Getopt::Long
- LWP::UserAgent
- HTTP::Request::Common qw(POST), qw(GET)
- HTTP::Headers
- XML::XPath
- XML::XPath::XMLParser
- JSON
To use the API, you first need to determine what alignments are available to the service. To do this run:
./remap_api.pl --mode batches
This will return a list of available alignments; sample output is shown below:
batch_id | query_species | query_name | query_ucsc | query_acc | target_species | target_name | target_ucsc | target_acc | alignment_date |
---|---|---|---|---|---|---|---|---|---|
3851 | Danio rerio | Zv8 | GCF_000002035.3 | Danio rerio | Zv9 | danRer7 | GCF_000002035.4 | 09/04/2011 | |
5591 | Homo sapiens | HuRef | GCF_000002125.1 | Homo sapiens | GRCh37.p5 | GCF_000001405.17 | 01/15/2012 | ||
5621 | Homo sapiens | NCBI36 | hg18 | GCF_000001405.12 | Homo sapiens | GRCh37.p5 | GCF_000001405.17 | 01/16/2012 | |
0 | Homo sapiens | GRCh37 | hg19 | GCF_000001405.13 | Homo sapiens | RefSeqGene | RefSeqGene |
The first column is the batch id. If there is a non-zero number in this column the pair can be used for normal remap (assembly-assembly). If the ID is 0, the alignments in that batch are only to be used for Clinical Remap.
To remap a set of features from one assembly to another you need to identify the assembly accession.versions that represent those assemblies; for more information on assembly accessions, check out the assembly help pages. The assembly accession.versions can be found in the columns labeled 'query_acc' and 'target_acc'. You can use assemblies in the query_acc in either the --from or --dest parameters (see below). The same is true of the target_acc parameter. That is, you can remap from NCBI36 (hg18) to GRCh37 (hg19) or from GRCh37 (hg19) to NCBI36 (hg18).
Parameters used for running Remap:
- Provide a name for the file containing the Remap report. If you don't provide this, a default name will be used.
- Provide a name for the the output file containing the Remap report. If you don't provide this a default name will be used.
For best results, please limit your submissions to input files of ~250,000 rows and at most 4 simultaneous submissions.
flag | Options | Comment | Optional/Required |
---|---|---|---|
--mode |
asm-asm |
Specifies the remap mode: asm-asm: Assembly-Assembly asm-rsg: Clinical Remap (from assembly ->RefSeqGene) rsg-asm: Clinical Remap (from RefSeqGene->assembly) alt-loci: Alt loci Remap (between a primary assembly and its related alt-loci) batches: returns list of alignments available to remap service |
Required |
--from | assembly accession.version | Assembly accession.version for the assembly you want to map from. You can find valid assembly accession.version numbers by running the 'batches' command. | Required |
--dest | assembly accession.version | Assembly accession.version for the assembly you want to map to. Note: Irrelevant for --mode alt-loci |
Required |
--annotation | file name | The name of the file containing your annotation. | Required |
--annot_out | file name | Provide a name for the the output file containing the remapped annotation. If you don't provide this a default name will be used. | Optional |
--report_out | file name | Provide a name for the the report file containing the remapped annotation. If you don't provide this a default name will be used. | Optional |
--gbench_out | file name | Provide a name for the file containing the genome workbench project. If you do not provide this parameter, no genome workbench file will be produced. | Optional |
--allowdupes | on off |
Allow multiple locations to be returned (on) or not (off). Default is 'on'. | Optional |
--merge | on off |
Allow alignments to merge across alignment gaps (on) or not (off). Default is 'on'. | Optional |
--mincov | number between 0.1 and 1 | Specify the minimal coverage a feature must have to be remapped. Default is 0.5. | Optional |
--maxexp | a number | Specify the maxmimum amount a feature is allowed to 'expand' upon remapping. Default is 2. | Optional |
--in_format |
guess (default) |
Specify the format of the input file. 'guess' is default. | Optional |
--out_format | hgvs bed gvf gff gtf gff3 vcf asnt asnb region |
Force the format of the output file to a specific type. Default is to match the format of the input type. Note: if you choose to output a file in a format that is different from the input type you may lose meta-data. | Optional |
Examples
Remap a file of annotations named 'my_annotes.gff' from NCBI36 to GRCh37.p5:
./remap_api.pl --mode asm-asm --from GCF_000001405.12 --dest GCF_000001405.17 --annotation my_annotes.gff --annot_out my_annotes.GRCh37.p5.gff --report_out my_annotes_NCBI36_GRCh37.p5.tsv --gbench_out my_annotes_GRCh37.p5.gbp
Use the clinical remap service to map a file of annotations called 'my_annotes.bed' from GRCh37 to the RefSeqGene set:
./remap_api.pl --mode asm-rsg --from GCF_000001405.13 --dest RefSeqGene --annotation my_annotes.bed