Generating and Viewing Sequence Overlap Alignment

Step 1: Introduction

This tutorial will take you through the steps involved in looking for an alignment between two DNA sequences. It uses the Find Overlaps tool, which is designed to look for dove-tail (end-to-end) alignments.

This tutorial assumes the user has already reviewed at least Basic Operation tutorial and has a basic knowledge of the program.

We will use Genome Workbench to review alignments and how to visually inspect annotations on two sequences in an alignment.

Step 2: Selecting Sequences to Align

For this exercise we selected two clone sequences (AC040978.8, AC115836.5) that are parts of the tiling path track for Human chromosome 8 (NC_000008) and are known to have overlap. First let us add both sequences into a new Genome Workbench project. Click File => Open, in dialog select Data from GenBank and paste AC040978.8, AC115836.5 into the Accession to Load box, click Next and Finish. Note: the other way to add data to the project is to use Search View (see Basic Operation tutorial, Step 3).

Open Data from GenBank dialog

Now there is a 'New Project' in the Project Tree View. Let us create a folder called Sequences in our project and add both sequences to this folder.

Right click (or control click on the Mac OS) on the 'Data' label and choose 'New Folder' from the contextual menu. In the New Project Folder dialog that appears, names the folder Sequences.

Create new project name dialog

Now let us put our sequences in the Sequences folder. This can be done by selecting the sequences, right clicking and choosing Cut, then selecting the Sequences folder, right clicking, and choosing Paste. The other way is just to select the sequences and move them using drag and drop functionality.

You can rename projects by selecting the project (green folder named 'New Project' in the project tree), right-clicking, and selecting Properties. Rename this project to: AC040978.8_AC115836.5_alignment.

Project in Project View

Step 3: Generating an Alignment

We will generate a specialized alignment for these two sequences - an overlap alignment. This alignment expresses the relationship seen between two clones assembled sequentially in a tiled BAC assembly.

Select both sequences within the project tree. Then select Tools=> Run Tool from the main menu. In the Run Tool dialog, choose Find Overlap between DNA Sequences and click Next.

Run Tool dialog Find Overlap selected

In the dialog that opens, select AC040978.8 as Sequence 1 and AC115836.5 as Sequence 2. Use the default alignment parameters for this alignment.

FindOverlap options

Then click Finish. When the alignment is complete, the result will appear in the AC040978.8_AC115836.5_alignment project in the Tool Result folder.

Find Overlap result in Project view

Note: The Find Overlaps tool first looks for a BLAST alignment between the component sequences, and if none is found, goes on to perform a banded global alignment. As the alignment is performed locally, repeat filtering is only available to external users if the repeats for the components have also been loaded locally.

Step 4: Viewing the Alignment in Multi-pane Cross Alignment View

To view this alignment right click on it in the Project View and choose Open New View in the context menu. From the Open View dialog choose Multi-pane Cross Alignment View. You will see a view like the one shown below. This viewer packs three views together - a Dot Matrix view of the alignment, and two Graphical Views, one on each sequence. Each view can be resized by clicking and dragging on its edges. Resize the panels to match the image shown. In this composite view, you should see a dot matrix view on left and two graphical views stacked vertically on the right.

The dot matrix view shows a single diagonal line that represents the alignment. Sequence 1 (AC040978.8) is on the Y-axis and Sequence 2 (AC115836.5) is on the X- axis. The negative slope of the line indicates the two sequences align in opposing orientations.

The top graphical window shows Sequence 1 as the master (anchor) sequence with Sequence 2 aligned beneath. In contrast, the bottom graphical window shows Sequence 2 as the master (anchor) sequence with Sequence 1 aligned beneath.

Multi Pan Cross Alignment view

Depending on your personal track configuration, you might see a bit different set of tracks in the graphical views on the right. In our example we show three tracks: master (anchor) sequence, alignment track and SNP (variation) track. If there are other features annotated on the master (anchor) sequences, they are available to see in the graphical views as separated tracks. To reveal/hide available tracks, use the Context icon at the bottom or the Gear icon on the upper panel in both graphical windows (see Basic Operation tutorial for more information).

A tooltip (pop-up window) containing additional information will open when the mouse is held over any alignment or feature. In the image below tooltip shows alignment information including percent of coverage and identity, mismatches, gaps and unaligned regions.

Multi Pan Cross Alignment view tooltip shown

Step 5: Taking a closer look

For a closer view of the alignment, double click on the gray alignment bar in either graphical views of Multi-pane view. This action will select alignment as an object and zoom to the level of the alignment in both graphical views and highlight selected alignment in the dot matrix view.

Multi Pan Cross Alignment view zoomed

If you select region in the one of the graphical views it is also highlighted in the dot matrix view and vise versa.

Multi Pan Cross Alignment view region selected

Marks within the alignment bars in the graphical views indicate mismatches, insertions and gaps. You can see these alignment details by increasing your zoom. In this example, zooming in on the lower graphical window reveals two base pair indel in the alignment at Sequence 2 (AC115836.5) position 177,420. A known variation associated with it is also shown.

Alignment at sequence level

Step 6: Additional Views: Alignment Span View

The Alignment Span Table View provides information about each segment of an alignment. In the case of a discontinuous alignment (such as the one in this tutorial), information about each segment of the alignment is represented in its own row. To see this view, select the alignment icon in the Project Tree and choose View=>Open View and choose Alignment Span View from the dialog. Or right click on the alignment and choose Open View and then choose Alignment Span View from the dialog. (Note: you can manage merging window size using right-click in the table and select Settings and set the threshold for merging window to a smaller size).

Alignment Span Table view

Broadcasting option can be used between span and multi-pan views. Select row in the Span Alignment Table View and jump to the Multi-pane Cross Alignment View, observe that regions selected in the span view are selected in all windows in the multi-pan view.

Broadcasting selection in Multi Pan Cross view

Step 7: Additional Views: Alignment Summary View

The Alignment summary view is a tabular view that shows an alignment summary for a set of alignments with each row corresponding to one alignment. To see this view, select the alignment file in the Project Tree and choose View=>Open View and choose Alignment Summary View from the dialog. Or right click on the alignment and choose Open View and then choose Alignment Summary View from the dialog. For our current overlapped alignment, the view shows only a single row. The row includes a set of predefined alignment attributes for query, subject and alignment. Right-click on the header to see the list off all the attributes.

Alignment Table Columns

There are three main actions you can do in this view:

  • Column sorting. The table can be sorted according to any column to allow users to find alignments of interest quickly.
  • Alignment broadcasting. Since each row corresponds to one alignment, selecting any cell on that row will automatically broadcast that alignment to other views inside Genome Workbench. This feature makes it easier to connect data in this view with elements in other views.
  • Query-based filtering. Alignments can be filtered based on any combination of column value comparisons using query-like language.

Any sequence that contains alignments can display an alignment summary view. Here are a few more examples that show different ways of opening alignment summary view.

Example 1 shows alignments annotated on a sequence (NC_000019). Sometimes it is not immediately obvious that a sequence has an alignment since the alignment is not shown as a file in the project view. However, you can examine an alignment summary view for any molecule for which alignment tracks are present (i.e. the alignments (queries) are annotated on a sequence (subject)).

Steps:

  • Import sequence NC_000019.
  • Double click on the project item, and select Alignment Summary View.

Below is the Alignment Summary view filtered in the Search box for the alignments with mismatches (mismatches>0).

Filtered Alignment Table

Example 2 shows a set of selected alignments from the Graphical Sequence View.

Steps:

  • Double click on the project item (NC_00019), and select Graphical View.
  • Turn on alignment track category if not already turned (use gear icon to open Configure Tracks dialog or use Content menu icon on the bottom and select Alignments).
  • Open any alignment track that has data for this chromosome, for example NG alignments.
  • Press Shift key + left mouse click and drag to select some alignments.
  • Right click in the view to launch a context menu.
  • Click on Open View --> Select Alignment Summary View.
  • Rearrange the views to show the summary view and graphical view stay side-by-side.

To check alignments broadcasting, select an alignment in one view and see the corresponding alignment gets selected in the other view. You should see an image similar the one below:

Selected Alignment

Example 3 shows alignments stored in a BLAST result (RID).

Steps:

  • Go to NCBI Blast page to blast a sequence, e.g. http://tinyurl.com/3f6qywz, and get an RID.
  • Import the RID into Genome Workbench (Click Open --> choose RIDs from NCBI Net Blast --> paste the RID).
  • Double click on the project item, and select Alignment Summary View.

Note: BLAST RID can be open in any other alignment views (Graphical view, Alignment Span view, Multiple alignment view, Multi-pane cross alignment view). Image below shows BLAST alignment in the Alignment Summary View broadcasted with the Graphical view.

BLAST Alignment for NM_002022

Step 8: Exporting Alignments

Alignments can be exported from Genome Workbench in a couple of formats. For the purposes of submission to NCBI, alignments should be exported in ASN.1 format.

In the graphical view, select an alignment and choose File=> Export from the main menu. A dialog box will open. From the list menu on the left side of this box, select ASN File. Choose one or more of the alignments (control (or shift) click for multiple selection). Choose a file name by clicking the small box labeled ... to select the file destination. Select Text for the ASN type and click Finish.

Export alignment as ASN file

Step 9: Finished

Congratulations! You now know how to perform a basic alignment between two DNA sequences in order to find a dove-tail overlap. You have also learned several ways to view alignments, and how to export an alignment from Genome Workbench.

Current Version is 3.7.1 (released October 13, 2021)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2021-08-03T21:30:03Z