The NCBI Genome Remapping Service (Remap) will be retired in November 2023. Read more.
FAQ
Back to NCBI Remap Page- Can you map features from an assembly for one organism to an assembly from another organism?
- Why isn't my favorite organism included on the organism list?
- Does feature projection from one assembly to another provide the same results as performing a de novo annotation?
- I keep getting an error that says 'unrecognized format', but I'm using a BED/GFF/GTF file.
- What is a RefSeqGene?
- I'm having problem trying to map variants using HGVS nomenclature, why isn't this working?
- Is there an API for the remapping service?
- Why do I sometimes get identical locations returned?
- The alignments I want are not available. Can these be made available?
- What is LRG?
- Why don't my features map to/from the GRCh37 (hg19) mitochondria?
- Why do I sometimes see edits to REF and ALT bases in the output?
- Does NCBI Remap left or right shift variant coordinates?
- What INFO tags does NCBI Remap add to output VCF?
-
Can you map features from an assembly for one organism to an assembly from another organism?
At this time, cross-species remapping is supported for only a limited number of organisms. We are investigating the expansion of this feature to additional organisms in the future.
-
Why isn't my favorite organism included on the organism list?
We have been adding organisms as we re-annotate assemblies for these organisms. To request that information for an organism be added more quickly use the Support Center link to request the organisms and the assemblies in which you are interested.
-
Does feature projection from one assembly to another provide the same results as performing a de novo annotation?
No, feature projection is not as robust as de novo annotation in most cases. All feature projection does is correlate the features on one assembly to features on another. Most processes for performing de novo annotation use additional heuristics in order to perform annotation and will likely come up with different answers, in some regions, when annotating a new assembly.
-
I keep getting an error that says 'unrecognized format', but I'm using a BED/GFF/GTF file.
Many of the tab-delimited formats have many valid variations. Please use the Support Center link and provide us with a sample of your file. We will either update our parsers to handle this version or report any errors we have found to you.
-
What is a RefSeqGene?
RefSeqGenes are human genomic sequences to be used as reference standards for reporting sequence variation on well characterized genes. Learn more about RefSeqGenes.
-
I'm having trouble mapping variants using HGVS nomenclature, why isn't this working?
There could be several issues causing this problem. First, ensure that your HGVS expression is valid at the HGVS nomenclature web site and check out how NCBI is handing HGVS expressions. If you are mapping variants that are defined on an NM accession, make sure you are selecting "I have data on: RefSeqGene" and "I want to map data to: Assembly name". Additionally, if you provide a file that doesn't specify an allele change (such as a BED) and request HGVS as an output on Assembly-Assembly remap, the software will remap your data, but will not emit HGVS expressions. We may have a bug in our software so please use the Support Center link to report the specific problem you are seeing as well as the HGVS expression you are using.
-
Is there an API for the remapping service?
Yes, there is an API for the remap service. This is managed as a Perl script that you can download from our FTP: ftp://ftp.ncbi.nlm.nih.gov/pub/remap Here is the API Documentation: https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/genome/tools/docs/api
-
Why do I sometimes get identical locations returned?
The second pass alignments are not guaranted to be unique, and in especially complicated regions of the genome there may be multiple alignments to related regions of the genome. When the 'merge' option is on these alignments get merged. Because of the duplication in the alignments this will look like duplication in the remap report.
-
The alignments I want are not available. Can these be made available?
We are happy to try to provide additional assembly-assembly alignments when we can. Before you request alignments, check that the assemblies you want aligned are submitted to NCBI. To check this, go to https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/assembly and search for your assemblies of interest. If both assemblies are available, use the Support Center link at the bottom of the page and send a request for these assemblies to be aligned and made available in Remap. Be sure to include the assembly accession.version in addition to the assembly name so that we align the correct assemblies.
-
What is LRG?
LRG is the abbreviation for Locus Reference Genomic, a collaboration between the RefSeqGene group at NCBI and the European Bioinformatics Institute (EBI). LRG sequences are human genomic sequences to be used as reference standards for well characterized genes. Learn more about LRG.
-
Why don't my features map to/from the GRCh37 (hg19) mitochondria?
To safeguard against mis-mapping features, NCBI Remap does not map mitochondrial features when using GRCh37 (hg19) (GCF_000001405.13) as a source/target assembly. No mitochondrial sequence was included in the initial GenBank release of GRCh37 (hg19). Beginning with GRCh37.p2 (GCF_000001405.14), GenBank releases of the human reference genome assembly and NCBI Remap include J01415.2 (RefSeq NC_012920.1), the revised Cambridge Reference Sequence (rCRS). However, various sources distributed GRCh37 with mitochondrial sequences:
- UCSC: GRCh37 (hg19) includes NC_001807.4 (African (Yoruba) mitochondria).
- Ensembl: Release 56 included NC_001807.4. Subsequent releases (Release 57 onward) included NC_012920.1.
- RefSeq: Build37.1 included NC_001807.4
- 1000 Genomes: NC_012920.1 is included in the mapping target for the main project.
If you need to map data between NC_001807.4 and NC_012920.1, we recommend you generate an input file containing only mitochondrial features and use NCBI36 (MT=NC_001807.4) and GRCh38 (MT=NC_012920.1) as your source/target assemblies. If you are not sure which version of the mitochondrial sequence is used in your data, you can compare the sequence by BLAST to NC_012920.1 and NC_001807.4, or check the sequence lengths (NC_012920.1 is 16,569 bp, whereas NC_001807.4 is 16,571 bp).
-
Why do I sometimes see edits to REF and ALT bases in the output?
If you are using a Variant Call Format (VCF) file as your input, you may find edits to REF and ALT bases in your remapped output under specific circumstances. The first is due to sequence differences in the source and target assemblies. The assembly to which a REF base refers differs in the input and output files. If you use a VCF file as your input, NCBI Remap will produce output annotation files in which REF bases refer to the sequence in the target assembly. This means that if a REF base differs between the source and target assemblies, the output VCF will report the target assembly base in the REF field. The corresponding ALT field in the output VCF will be updated, with the source assembly REF base replacing or being appended to the ALT base that was provided in the input VCF, as appropriate. The second circumstance is due to error in an input VCF. If the base specified in the REF column of an input VCF is incorrect, the correct base will be reported in the output VCF and the input base will be added to the ALT column. Note: Remap does not currently edit genotype information provided in the input file to reflect edits to the REF and ALT fields in the output file. This means that genotype information in edited rows may not be correct and should be reviewed prior to analysis. If you have selected VCF as your output file type, all NCBI Remap edits to the REF and ALT fields are reported using INFO tags. For more information about variant remapping and INFO tags added by NCBI, please see the FAQ question What INFO tags does NCBI Remap add to output VCF?, as well as the What is NCBI Remap page.
-
Does NCBI Remap left or right shift my variant coordinates?
If you are using a Variant Call Format (VCF) file as your input, NCBI Remap will left-shift variants prior to remapping them. Upon remapping, it will left-shift again with respect to the target assembly. Therefore, when using VCF as your input, all output files will contain left-shifted coordinates. This ensures output VCF meets file specifications. If you provide VCF as your input and specify HGVS as your output, please note that the HGVS will also contain left-shifted coordinates. At this time, NCBI does not provide an equivalent right-shifting function for input HGVS files. This is planned for a future release.
-
What INFO tags does NCBI Remap add to output VCF?
ID Number Type Description REMAP_ALIGN 1 String Alignment type used for remapping (FP=first pass, SP=second pass) REF_ERROR 0 Flag REF base does not match source assembly REF_EDIT 0 Flag REF and ALT bases modified due to difference in REF base in source and target assemblies or left-shifting of input REF base DEPRECATED: REF_UPDATE 0 Flag REF and ALT bases modified due to difference in REF base in source and target assemblies DEPRECATED: REF_LEFT_SHIFT 0 Flag Position of REF base left-shifted