U.S. flag

An official website of the United States government

Create Protein Alignments using ProSplign

Introduction

This tutorial will take you through the steps to generate protein to genomic sequence alignment. The underlying algorithm, ProSplign, was developed at NCBI for handling frameshifts and mRNA splicing events. Detailed documentation of this algorithm can be found at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/sutils/static/prosplign/prosplign.html.

This tutorial assumes the user has already reviewed the Basic Operation tutorial .

Step 1: Select Sequences to Align

Open Genome Workbench and import a genomic range and a protein from GenBank: NC_000006.12:30699k-30718k, XP_011513306.1

Import Genomic Range

Select the genomic range in the Project View and open it in Graphical Sequence View. Make sure that the Alignments track is visible.

Open Graphical Sequence View

Step 2: Generate the Alignment

Select both items in the Project View.

Select both items

Right-click the selected items, and then click the Run Tool.

Click Run Tool

In the Run Tool dialog, in the Alignment Creation section, select the ProSPLIGN tool.

Select ProSplign Tool

Click Next.

Click next

ProSplign generates pairwise alignment between a protein and a genomic sequence. Here you can select multiple genomic ranges or transcripts to be aligned to one protein. The General Options tab allows to set various options. You can choose the genomic sequence strand that may be ‘Plus’, ‘Minus’ or ‘Both’. For sequences that have no introns, uncheck the ‘With introns’ checkbox. The genetic code is automatically determined from the organism associated to the sequence, or you can select it manually. You can also choose three of the scoring parameters: the frameshift and the gap opening cost as well as the gap extension cost for one amino acid.

choose parameters

The Refinement Options tab allows to set options for post-processing the alignment. By default, this option is set. To unset, remove checkmark in the Refine the alignment checkbox. The other checkboxes are responsible for removing only the flank regions and for removing Ns from the end of good regions from the full alignment.

The flank positives and the total positives are the minimum percentage of positives the final refined alignment will have. If the percentage of positives is less than the total positives, more bad pieces will be removed. Any flank with percentage of positives less than the flank positives will be trimmed. Good regions shorter than the minimum length of good region will also be trimmed.

The minimum exon identity/positives represent the smallest percentage of exon identity/positives that may appear in the refined alignment for either a full or partial exon. The number of bases in the first and the last exon that will appear in the refined alignment will be at least the minimum flanking exon length.

To restore both the general and the refinement parameters to their default values, click Defaults.

When you are finished choosing your settings, click Finish. The generated alignment is added to the Project:

Alignment added to the project

Meanwhile, observe that the new alignment is displayed in the Graphical Sequence View in the Alignments track. A tooltip appears when you hover over it.

Step 3: Cancel the Alignment Creation

When you are not sure whether the protein aligns to the forward or the reverse strand of the genomic sequence, select ‘Both’ in the ProSPLIGN dialog. ProSplign generates alignments on both strands and retains the one that has a better match.

Better match alignment

Click Next, and then Finish on the next page. The task of generating the protein alignment is listed in the Task View. Select the row and right-click to see the context menu. To cancel the task, click Cancel Task.

click Cancel Task

This interrupts the alignment creation process, and the application notifies the user that no alignments were created.

no alignments created

Current Version is 3.8.2 (released December 12, 2022)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2021-07-26T21:55:05Z