Basic Operation

Step 1: Introduction

This tutorial provides a broad overview of how to use Genome Workbench to analyze and display data. Before beginning this tutorial download and install Genome Workbench from the install page. If you do not have administrative privileges on your computer, please install the program somewhere within your user home folder instead of the default location.

All the illustrations in this tutorial are fully applicable for the MS Windows 10 users. Unix/Linux users or users of the other MS Windows versions might experience minor variations in default settings, column order, window size, and other insignificant differences.

Please download the project created and saved in this tutorial

Step 2: Starting up

When Genome Workbench starts up, the Main Application Window appears with several different panes. The Project Tree View is on the left and it is empty upon startup. This is where the data you load, the analyses you do, and the views you create will be stored.

The Views (as referred to in this and other tutorials) are different windows within the Genome Workbench application providing the user with the information on various aspects of the work of the application and data. Views provided by the application are available through the View drop-down menu, data views are created by users.

The Main Application Window has many views and is completely configurable. You can add the views to the screen by selecting the view that is currently not showing from the View drop-down menu. You can close the view by clicking on the X in the right corner of the view tab. You can click on a particular view and drag it where you want it. You can also stack views so that they appear in folder tabs like the Event View and the Task View shown below. When you click on a view to move it you will see icons showing the docking options for the view. By dropping the view on your choice, you will dock the view to the desired position. Genome Workbench will remember your configuration, so you only have to do this once.

Main application window

To start let us search for some data in the public databases at NCBI. Select (if you have opened multiple views) the Search view from the Main Application Window. The Search View provides a single interface for many of the most frequent kinds of searches done in Genome Workbench.

The Search View is accessible from the main menu as Tools => Search and View => Search or from the toolbar by clicking on the binoculars icon.

Let us now search for the gene superoxide dismutase in the Entrez Gene database. To do this, follow the steps below:

  • Make sure the Search Tool in the Search View says: Search NCBI Public Databases
  • Select Gene for the NCBI Database.
  • Enter the gene name superoxide dismutase in the search box located to the right from the Select NCBI Database drop-down list.
  • Press the Enter key or click the Start button.

Search view options

Step 4: Search for Genes

You should see a set of results like those shown below. You can adjust the width of the columns by clicking on the divider between the column headings. If you right-click in the column header you can choose to turn some columns on/off.

Search result

Select the item in the list corresponding to the human variant (Organism is Homo sapiens and the Label is SOD1 - might take a bit to find it) and right-click. Choose Add to Project from the contextual menu. You should see a dialog like the one below. The Entrez Gene database formats an object that contains a wealth of information about the gene and its placement on various assemblies. All of these are available to Genome Workbench.

Select project for search result

Click OK in the dialog to create a new project. We will use the defaults from this dialog. In the future you will have the option to add it to an existing project or to change how it appears in the project tree.

Step 5: Viewing your data

Once the project is loaded, the main window should look somewhat like the image below. The Projects area should now contain opened blue notebook icon for Projects and opened green notebook icon with Data folder for the new project we just created.

Project created

Let us now take a look at this data. To open a view, select the data in the project tree, right-click, and choose Open New View. You could also choose View=>Open New View from the menu bar or double-click data. When the dialog appears choose Graphical Sequence View.

Open view dialog

Genome Workbench will now ask you which sequence you wish to view. As mentioned previously the Entrez Gene object holds references to many sequences in many different placements for each gene. You should see a dialog like the one below, asking you which sequence you want to view. We will start by looking at the placement of the gene on the reference chromosome. This sequence can be identified using the description column in the dialog. In addition, it can be identified through patterns of accessions:

  • Accessions beginning with NC_ are reference sequence chromosomes
  • Accessions beginning with NT_ are reference sequence contigs
  • Accessions beginning with NG_ are reference sequences that have had some degree of human curation.
  • Accessions beginning with NM_ are reference sequence mRNAs.
  • Accessions beginning with NP_ are reference sequence proteins.

Select molecule to view

Step 6: The Graphical View

Select the desired sequence and click the Finish button. Since we selected chromosome record with range (NC_0000021.9:31,659,693-31,668,931), the graphical view is zoomed to the SOD1 gene located in this region on chromosome 21.

Graphical view gene

The graphical view shows the public annotations on a sequence using both color and arrangement to show the relationships. In the above view different annotations (features) are shown in different colors:

Green bars represent genes. • Purple bars represent transcripts / mRNAs. • Red bars represent coding regions / proteins.

For more details on the colors and arrangements used, please see the Sequence Viewer legends document.

All features and data are represented as separated tracks. By default, for human chromosome, Graphical view includes:

  • Sequence track that represent reference sequence all other features and data mapped to
  • Scaffold track that shows scaffolds used to assemble the chromosome sequence
  • Tiling path track that contains a set of blue bars representing the components that are used to assemble scaffold sequences (larger genomic sequences are split into many smaller pieces and reassembled from these chunks, the blue bars show you where the chunk boundaries are and what the approximate overlap between the chunks is)
  • Gene track that contains all annotated genes/RNAs/CDSs
  • NCBI current annotation release track
  • Ensembl current annotation release track
  • Some alignment tracks

Number of other tracks are available in Genome Workbench. To figure out how to configure the Graphical Sequence View in terms of track sets and track options, please see step 9 of this tutorial.

There are several ways to zoom in and out in the graphical view. One way shown below uses a zoom slider. To show the zoom slider press and hold the Z key. You can then zoom in and out by left-clicking and dragging the mouse up and down. The view will change the level of zoom in real time.

Grapical view navigation1

You can also pan the view from left to right (right to left) by left-clicking and dragging from left to right (right to left).

A third way to zoom in is called rectangular or regional zoom. It is available by holding down the R key, left-clicking, and dragging over a region. When you do this, you will see a view like the one below. Try this now to zoom in to a region around an exon as shown below.

Graphical view navigation2

For the full set of navigation options/hot keys please see Graphical View Navigation and Manipulation tutorial.

Step 8: Zoom in the Graphical View

The Graphical Sequence View balances the depth of detail with the depth of zoom. As you zoom in more and more you will see more and more details. In the image below the view is zoomed all the way into the actual sequence. The gray sequence bars across the top (Sequence track) are now duplicated, showing both the forward and reverse (complemented) strand of the chromosomal sequence.

If you select a coding region annotation (in Genes tracks) you will see (inscribed beneath each amino acid residue) the letters of the codon actually responsible for that amino acid.

If you hover over an annotation you will receive a tooltip providing additional information. The images below show two tooltips - one for the sequence, providing details about the organism involved and the location over which the mouse is positioned and - one over a GDS feature, showing information about the feature as well as the set of links and tools available for this feature.

Zoomed to sequence 1

Zoomed to sequence 2

For more details on the graphical view legend please see the Sequence Viewer legends document.

Step 9: Configure the Graphic View

The graphical view supports a wide range of visual customizations to make it easier to understand the data presented. The controls for customizations highlighted in the image below. The individual track controls will appear when you place the mouse over the track’s title bar.

Track controls

To customize Graphical Sequence View, you can perform following actions:

  • Select set of tracks to show up in the view
  • Use Gear icon or Content option in the bottom panel
    • In many cases you may be interested in seeing just a particular track or set of tracks. The Configure tracks dialog/Content drop-down menu lets you choose what to see. For example, the Genes option shows subset of gene tracks and any of these sub-tracks show genes, mRNAs, and coding region annotations.
  • To change the order of the tracks, click in the track's title bar and drag.
  • Configure Individual tracks settings
  • Use track panel at the track’s right corner (highlighted in the image above)
    • For Gene tracks, for example, you might want to hide gene feature bar and see RNA and CDS merged in one line, to do it you need select “Show RNAs” and “Show CDSs” in the Context dropdown list and select “Merge all RNAs and Proteins” in the Layout Style dropdown list. Note: track options are different depending on the track data.
    • Individual track menu also has options to collapse/expend and close track;
  • Adjust general view options (like decorations, color, etc.)
  • Use bottom panel, you can change:
    • Label position. By default, labels are placed on the side of features, other available options are: top and no labels.
    • Feature Decoration. Annotations, such as mRNA, and coding region features can be displayed with a wide variety of decorations, such are circle or square anchors, arrow fletchings, and different kinds of arrow heads.
    • Size. For many displays a more compact view size is preferred. To access this, choose the Compact option for size.
    • Color. Grayscale color is available, while multicolor is default.

As an example of customized view, see image below. We zoomed out to the whole chromosome level, removed Scaffolds track and Tiling Path (Components) track, collapsed all alignment sub-tracks, collapsed all gene sub-tracks but the current NCBI annotation release sub-track, and added SNP tracks with Clinical dbSNP sub-track expended.

Track customized view 1

We also changed NCBI annotation release sub-track layout style from default (no merge) to the “Merge all RNAs and proteins”. To see this change, you need zoom in. The fastest way to zoom back to superoxide dismutase gene is to use search in the Graphical View. Just type gene abbreviation SOD1 in the search box and click on binocular icon next to it. Observe that now RNA and CDS of this gene are merged into one line that reflects RNA and CDS structure by color gradient.

Track customized view 2

Genome Workbench will remember customized graphical view and open this configuration next time when you will refer to the same organism/molecule.

Step 10: Launching Tools from the Graphical View

In the graphical view everything you see is selectable, and you can perform actions on the things you select. To select an item just click on it and the object will highlight. For object selection overview and region selection please see Graphical View Navigation and Manipulation tutorial.

Let us run a BLAST search on a sequence region in the graphical view. Select a region of the genome containing SOD1 gene by clicking in the ruler above the sequence and dragging a gray rectangle to cover the gene model. Then choose Tools => Run Tool from the main menu (or right click context menu) and the Run Tool dialog will appear. Select BLAST in the dialog. This tool will submit the sequence to the NCBI BLAST service for alignment against a set of sequences (database).

Run Tool dialog

When you choose BLAST tool, you should see a dialog like the one below. There may be more than one sequence listed in the dialog, go ahead and choose the one whose label define SOD1 location on NC_000021.9. For this particular example, there are a few things to change. First, let us choose MegaBLAST from the Program dropdown (if not selected by default). Second, for subject/database select NCBI database/nr Nucleotide collection (nt). Third, enter an Entrez query biomol mrna[prop] (it ensures that we are considering only the molecules known to be mRNAs).

Tool blast options 1

Click Next.

There are many parameters that can be changed for BLAST, we are going to accept the defaults and remove checkmark from “Link related hits together” checkbox.

Tool blast options 2

Select these options and click Finish. Results will be added to the existing project in this tutorial.

Once the sequence has been submitted, it will be entered into the BLAST polling system for retrieval. The Task View will help you track your job's progress.

Blast running

Step 11: Viewing Alignment Results

When our alignment job is finished, the data will be added to the Graphical View automatically. It can be found among Alignment sub-tracks. Also, our Alignment will be added to the Project Tree View:

Blast results in views

Incidentally, you can arrange the data in the Data folder. If you select the folder, then right-click (or control-click for Mac OS), you will find options in the context menu to create a new folder. You can then cut and paste or drag-and-drop it into new locations. As you collect more and more results in a project, these folders will help you keep track of what each set of results means.

Step 12: Manipulating Alignments

The graphical view shows the newly obtained BLAST alignments as a set of disconnected bars hanging underneath our models. By default alignment track layout style is in adaptive mode and show heatmap of hits in a single row, to see individual alignments you need to zoom in or select “Show all” option for Layout Style as it is in the image below.

Blast show all in GSV

While this gives us some idea of regions of the genome that likely provide coding potential, it also confuses the issue significantly: hits into different regions of the same accession are not connected. Genome Workbench provides a couple of tools to make this visualization easier.

Let us clean the alignment up by using the Clean Up Alignments tool. Select the alignment you have just created in the Project View by left-clicking it. It is important that you select the alignment and therefore make it the active object in the system for the tool only works on the whole alignment. Select Tools => Run Tool command or click the tool icon. You can also right-click the alignment and select the Run Tool command.

The system will present the following screen.

Tool clean Up

Select the Clean Up Alignments tool by clicking next to it and click Next (if you select the tool by clicking on it you do not have to click the Next button).

Tool clean Up 2

Select the alignments set you need (for this exercise we select all the alignments) and click Finish. The graphical view will change to look like the image below, and a new item (or several items) will appear in the project tree. Now hits into different regions of the same accession are connected and shown as extended rows representing potential coding regions.

Clean Up result in views

Step 13: Saving your project

If you want to save your project, select File=>Save Projects (or Save Projects as) from the drop-down menu.

Save poject 1

The system will present you with the options to name your project and to select the location.

Save poject 2

Click Save. Now your project is saved with the name you have selected to the location you have selected and can be opened again for future use, copied, sent via e-mail, etc.

Step 14: Summary

Congratulations, you have completed the first tutorial!

In this tutorial we examined several aspects of how Genome Workbench allows you work with data including:

  • Searching for data using NCBI public databases such as Entrez Gene.
  • Exploring projects in the project explorer portion of the Genome Workbench interface
  • Creating and navigating in views such as the Graphical Sequence View.
  • Running analyses such as creating alignments of sets of sequences.

In all of these explorations there are some common themes that will help you get the most out of Genome Workbench including:

  • Use of context menus. If you right click (or control-click in Mac OS) on a view or on a selection somewhere you will receive information about that object, including things that can be done to that object.
  • Customization. None of the sets of view parameters is correct for everyone. Genome Workbench contains some easy ways to customize each view so that the views can provide you with more information relevant to you.
  • Selections. Things you see on the screen are selectable, whether they are annotations on sequence, rows of an alignment, or swaths of sequence themselves. Genome Workbench uses selections as inputs to analyses.

Current Version is 3.6.0 (released March 04, 2021)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2021-02-18T18:49:16Z