Generating and Viewing Sequence Overlap Alignment
Step 1: Introduction
This tutorial will take you through the steps involved in looking for an alignment between two DNA sequences. It uses the Find Overlaps tool, which is designed to look for dove-tail (end-to-end) alignments.
This tutorial assumes the user has already reviewed at least Basic Operation tutorial and has a basic knowledge of the program.
We will use Genome Workbench to review alignments and how to visually inspect annotations on two sequences in an alignment.
Step 2: Selecting Sequences to Align
For this exercise we selected two clone sequences (AC040978.8, AC115836.5) that are parts of the tiling path track for Human chromosome 8 (NC_000008) and are known to have overlap. First let us add both sequences into a new Genome Workbench project. Click File => Open, in dialog select Data from GenBank and paste AC040978.8, AC115836.5 into the Accession to Load box, click Next and Finish. Note: the other way to add data to the project is to use Search View (see Basic Operation tutorial, Step 3).
Now there is a 'New Project' in the Project Tree View. Let us create a folder called Sequences in our project and add both sequences to this folder.
Right click (or control click on the Mac OS) on the 'Data' label and choose 'New Folder' from the contextual menu. In the New Project Folder dialog that appears, names the folder Sequences.
Now let us put our sequences in the Sequences folder. This can be done by selecting the sequences, right clicking and choosing Cut, then selecting the Sequences folder, right clicking, and choosing Paste. The other way is just to select the sequences and move them using drag and drop functionality.
You can rename projects by selecting the project (green folder named 'New Project' in the project tree), right-clicking, and selecting Properties. Rename this project to: AC040978.8_AC115836.5_alignment.
Step 3: Generating an Alignment
We will generate a specialized alignment for these two sequences - an overlap alignment. This alignment expresses the relationship seen between two clones assembled sequentially in a tiled BAC assembly.
Select both sequences within the project tree. Then select Tools=> Run Tool from the main menu. In the Run Tool dialog, choose Find Overlap between DNA Sequences and click Next.
In the dialog that opens, select AC040978.8 as Sequence 1 and AC115836.5 as Sequence 2. Use the default alignment parameters for this alignment.
Then click Finish. When the alignment is complete, the result will appear in the AC040978.8_AC115836.5_alignment project in the Tool Result folder.
Note: The Find Overlaps tool first looks for a BLAST alignment between the component sequences, and if none is found, goes on to perform a banded global alignment. As the alignment is performed locally, repeat filtering is only available to external users if the repeats for the components have also been loaded locally.
Step 4: Viewing the Alignment in Multi-pane Cross Alignment View
To view this alignment right click on it in the Project View and choose Open New View in the context menu. From the Open View dialog choose Multi-pane Cross Alignment View. You will see a view like the one shown below. This viewer packs three views together - a Dot Matrix view of the alignment, and two Graphical Views, one on each sequence. Each view can be resized by clicking and dragging on its edges. Resize the panels to match the image shown. In this composite view, you should see a dot matrix view on left and two graphical views stacked vertically on the right.
The dot matrix view shows a single diagonal line that represents the alignment. Sequence 1 (AC040978.8) is on the Y-axis and Sequence 2 (AC115836.5) is on the X- axis. The negative slope of the line indicates the two sequences align in opposing orientations.
The top graphical window shows Sequence 1 as the master (anchor) sequence with Sequence 2 aligned beneath. In contrast, the bottom graphical window shows Sequence 2 as the master (anchor) sequence with Sequence 1 aligned beneath.
Depending on your personal track configuration, you might see a bit different set of tracks in the graphical views on the right. In our example we show three tracks: master (anchor) sequence, alignment track and SNP (variation) track. If there are other features annotated on the master (anchor) sequences, they are available to see in the graphical views as separated tracks. To reveal/hide available tracks, use the Context icon at the bottom or the Gear icon on the upper panel in both graphical windows (see Basic Operation tutorial for more information).
A tooltip (pop-up window) containing additional information will open when the mouse is held over any alignment or feature. In the image below tooltip shows alignment information including percent of coverage and identity, mismatches, gaps and unaligned regions.
Step 5: Taking a closer look
For a closer view of the alignment, double click on the gray alignment bar in either graphical views of Multi-pane view. This action will select alignment as an object and zoom to the level of the alignment in both graphical views and highlight selected alignment in the dot matrix view.
If you select region in the one of the graphical views it is also highlighted in the dot matrix view and vise versa.
Marks within the alignment bars in the graphical views indicate mismatches, insertions and gaps. You can see these alignment details by increasing your zoom. In this example, zooming in on the lower graphical window reveals two base pair indel in the alignment at Sequence 2 (AC115836.5) position 177,420. A known variation associated with it is also shown.
Step 6: Additional Views: Alignment Span View
The Alignment Span Table View provides information about each segment of an alignment. In the case of a discontinuous alignment (such as the one in this tutorial), information about each segment of the alignment is represented in its own row. To see this view, select the alignment icon in the Project Tree and choose View=>Open View and choose Alignment Span View from the dialog. Or right click on the alignment and choose Open View and then choose Alignment Span View from the dialog. (Note: you can manage merging window size using right-click in the table and select Settings and set the threshold for merging window to a smaller size).
Broadcasting option can be used between span and multi-pan views. Select row in the Span Alignment Table View and jump to the Multi-pane Cross Alignment View, observe that regions selected in the span view are selected in all windows in the multi-pan view.
Step 7: Additional Views: Alignment Summary View
The Alignment summary view is a tabular view that shows an alignment summary for a set of alignments with each row corresponding to one alignment. To see this view, select the alignment file in the Project Tree and choose View=>Open View and choose Alignment Summary View from the dialog. Or right click on the alignment and choose Open View and then choose Alignment Summary View from the dialog. For our current overlapped alignment, the view shows only a single row. The row includes a set of predefined alignment attributes for query, subject and alignment. Right-click on the header to see the list off all the attributes.
There are three main actions you can do in this view:
- Column sorting. The table can be sorted according to any column to allow users to find alignments of interest quickly.
- Alignment broadcasting. Since each row corresponds to one alignment, selecting any cell on that row will automatically broadcast that alignment to other views inside Genome Workbench. This feature makes it easier to connect data in this view with elements in other views.
- Query-based filtering. Alignments can be filtered based on any combination of column value comparisons using query-like language.
Any sequence that contains alignments can display an alignment summary view. Here are a few more examples that show different ways of opening alignment summary view.
Example 1 shows alignments annotated on a sequence (NC_000019). Sometimes it is not immediately obvious that a sequence has an alignment since the alignment is not shown as a file in the project view. However, you can examine an alignment summary view for any molecule for which alignment tracks are present (i.e. the alignments (queries) are annotated on a sequence (subject)).
Steps:
- Import sequence NC_000019.
- Double click on the project item, and select Alignment Summary View.
Below is the Alignment Summary view filtered in the Search box for the alignments with mismatches (mismatches>0).
Example 2 shows a set of selected alignments from the Graphical Sequence View.
Steps:
- Double click on the project item (NC_00019), and select Graphical View.
- Turn on alignment track category if not already turned (use gear icon to open Configure Tracks dialog or use Content menu icon on the bottom and select Alignments).
- Open any alignment track that has data for this chromosome, for example NG alignments.
- Press Shift key + left mouse click and drag to select some alignments.
- Right click in the view to launch a context menu.
- Click on Open View --> Select Alignment Summary View.
- Rearrange the views to show the summary view and graphical view stay side-by-side.
To check alignments broadcasting, select an alignment in one view and see the corresponding alignment gets selected in the other view. You should see an image similar the one below:
Example 3 shows alignments stored in a BLAST result (RID).
Steps:
- Go to NCBI Blast page to blast a sequence, e.g. http://tinyurl.com/3f6qywz, and get an RID.
- Import the RID into Genome Workbench (Click Open --> choose RIDs from NCBI Net Blast --> paste the RID).
- Double click on the project item, and select Alignment Summary View.
Note: BLAST RID can be open in any other alignment views (Graphical view, Alignment Span view, Multiple alignment view, Multi-pane cross alignment view). Image below shows BLAST alignment in the Alignment Summary View broadcasted with the Graphical view.
Step 8: Exporting Alignments
Alignments can be exported from Genome Workbench in a couple of formats. For the purposes of submission to NCBI, alignments should be exported in ASN.1 format.
In the graphical view, select an alignment and choose File=> Export from the main menu. A dialog box will open. From the list menu on the left side of this box, select ASN File. Choose one or more of the alignments (control (or shift) click for multiple selection). Choose a file name by clicking the small box labeled ... to select the file destination. Select Text for the ASN type and click Finish.
Step 9: Finished
Congratulations! You now know how to perform a basic alignment between two DNA sequences in order to find a dove-tail overlap. You have also learned several ways to view alignments, and how to export an alignment from Genome Workbench.
Current Version is 3.8.2 (released December 12, 2022)
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Configure tracks and track display settings
- Working with Non-Public Data
- Viewing Multiple Alignments and Trees
- Broadcasting
- Genes and Variation
- Generating and Viewing Sequence Overlap Alignment
- Working with BAM Files
- Loading Tabular Data
- Working with VCF Files
- Sequence View Markers
- Opening Projects in Genome Workbench
- Publication quality graphics (PDF/SVG image export)
- Editing in Genome Workbench
- Create Protein Alignments using ProSplign
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Searching in Genome Workbench
- Graphical View Navigation and Manipulation
- Using the Text View to Review and Edit a Submission
- BAM haplotype filtering
- Displaying new non-NCBI molecules with annotations
- Creating phylogenetic tree from precalculated multiple alignment
- Creating phylogenetic tree starting from search
- Video Tutorials
General use Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners
- Running Genome Workbench over X Window System
NCBI GenBank Submissions Manuals
- Table of Contents
- Introduction
- Genome Submission Wizard
- Save Submission File
- Reports
- Import
- Sequences
- Add Features
- Add Publication
- Comments
- Editing Tools