Working with Non-Public Data
Step 1: Introduction
This tutorial demonstrates two different ways to manage private data in Genome Workbench.
- You have created your own sequence and want to work with it in Genome Workbench
- You want to view your own data/annotation on a publicly available sequence
We will demonstrate using some of the Genome Workbench tools on the data not found in the NCBI databases.
It is recommended that you complete Basic Operation tutorial first.
Sample data you will need to complete this tutorial - BX530088_BX572102.
Step 2: Getting Started
For the first exercise we are going to do the following:
- Load a user-generated AGP file (download sample)
- SPLIGN some mRNAs on that AGP sequence
- Create a FASTA file from the AGP
- BLAST that FASTA sequence to see what is related to it
- WindowMask that FASTA sequence (or part of it) to look for repetitive regions
Genome workbench starts up and displays the main screen. Choose File=>Open from the main menu, select File Import on the left side of the dialog, click the folder icon on the right to point to the file location. Genome Workbench understands many different file formats including fasta files with local IDs. For this step choose BX530088_BX572102.comp.agp from the data files downloaded. Click Next => see dialog for Fasta file uploading (Note: In case if your sequences are local, you will need upload fasta file. For our current example, sequences have been submitted in GeneBank and have accession numbers, thus fasta will be pulled up automatically), click Next again to accept the defaults. Then click Finish to add the data file to a new project.
Now that your data is loaded, you can view it by selecting the data in the project tree, right clicking and choosing Open New View. Then choose Graphical View. While this is not very interesting you can zoom in to see the sequence.
Step 3: Apply the tool (SPLIGN) to private data
Now let us align an mRNA to our sequence. We will use the SPLIGN tool. SPLIGN (or SPLiced Aligner) is a global alignment tool used in NCBI's annotation pipeline. Open the NM_020137.3 RID from GenBank database (File=>Open) and add it to the project.
Click Next and Finish. Both entries are now shown in the data folder.
Select both entries (SHIFT+left click in both MS Windows and Mac OS). With both entries selected click Tools=>Run Tool to open the Tools dialog and choose SPLIGN and Next (if you click on SPLIGN text exactly, you will be taken to the next screen even without having to choose Next).
Select BX530088... for the Genomic Sequence and NM_020137.3 for the Transcript Sequence. If you do not see both sections of the dialog you need to drag down the lower border of the dialog box.
Click Finish, results will be added to the existing project ones finished. SPLIGN alignment will be displayed in the Graphical view as an Alignment track.
Step 4: Export a FASTA file
Select the data file in the Project Tree View we loaded previously. Right click (control click in the Mac OS) on the selected data and choose Export. Select FASTA as the format, select a location, and give the file a name.
Click Finish.
Now open the FASTA file you have just created. Choose File=>Open. Select the file and click Next. Accept the default settings and click Next again. Choose to create a new project and click Finish.
Select the FASTA data in the Project Tree View and double click it. From the Open View menu choose Graphical View.
Step 5: Alignment (BLAST and Clean Up)
To perform BLAST alignment for the entire sequence choose Run Tool (Tools=>Run Tool from the main menu, or Right Click (control-click on the Mac OS)). From the Run Tool dialog choose BLAST Search. (Note: you can perform BLAST for the particular region as well, in this case you need to select region of interest by click on the top ruler and drag in any direction).
Click Next.
In the BLAST Search dialog ensure you have selected the Nucleotide option, Nucleotide-Nucleotide (MegaBLAST) from the Program menu, and nr(Nucleotide collection (nt)) from the Database menu. Input biomol mrna[prop] search string into the Entrez Query field.
Click Next.
From the next dialog, accept the general parameters and check the Filter low complexity regions and select Human from the Species specific repeats for dropdown list.
Then click Finish. As BLAST is finished ( ), results will be added to the to the corresponding project (New Project (1)). It can take some time for the analysis to return and present the results.
To see individual hits in more clear way, we will apply Clean Up Alignment tool to our BLAST alignment. This tool will filter hits and place all hits to the same accession as a separated row. Select BLAST result in the project view and run tool dialog, choose Clean Up Alignments and click Next.
Accept default in the next dialog and click Finish.
Cleaned Up BLAST result should appear in the Tools Result folder in the corresponding project (New Project (1)).
Zoom in to see individual hits, open tooltips for more information about hits/alignments.
Step 6: WindowMasker
In this step we will use WindowMasker on the FASTA sequence to look for repetitive regions. First let us upload the mask. Select Tools=>WindowMasker Data. (Note: WindowMasker path is not available for the outside NCBI users). In the dialogue that appears select human.tar.gz as the mask and click OK button. Window masker folder will be created automatically in the “GenomeWorkbench2” folder and data downloaded.
In the Graphical Sequence View collapse the Cleaned alignment track and select the region by clicking on the ruler and dragging a selection around a region.
Choose Tools=>Run Tool from the main menu. Select Search/Find Repetitive Sequences with WindowMasker row and click Next (if you click on tool’s text exactly, you will be taken to the next screen even without having to click Next).
Ensure that our region of sequence is selected, select 9606 Homo sapiens from the Mask using parameters for dropdown list.
Note: If not downloaded previously, window Masker Files can be downloaded via Configure option of this dialog:
Click Next, choose a project to add the results to and click Finish. It can take some time for the job to complete.
The result is a histogram showing regions of repeats. If the histogram does not appear automatically, select the content menu at the bottom of the graphical view and choose Repeat Regions.
You can scroll and zoom just like you would any other view.
Step 7: Conclusion
There are multiple ways to use Genome Workbench and this only shows some very simple examples. It gives you enough background to start exploring your data in new and interesting ways. It gives you the privacy you need along with the access to public data desired. For more information on working with BAM and GFF3 files refer to Displaying new non-NCBI molecules with annotations tutorial.
Current Version is 3.8.2 (released December 12, 2022)
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Configure tracks and track display settings
- Working with Non-Public Data
- Viewing Multiple Alignments and Trees
- Broadcasting
- Genes and Variation
- Generating and Viewing Sequence Overlap Alignment
- Working with BAM Files
- Loading Tabular Data
- Working with VCF Files
- Sequence View Markers
- Opening Projects in Genome Workbench
- Publication quality graphics (PDF/SVG image export)
- Editing in Genome Workbench
- Create Protein Alignments using ProSplign
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Searching in Genome Workbench
- Graphical View Navigation and Manipulation
- Using the Text View to Review and Edit a Submission
- BAM haplotype filtering
- Displaying new non-NCBI molecules with annotations
- Creating phylogenetic tree from precalculated multiple alignment
- Creating phylogenetic tree starting from search
- Video Tutorials
General use Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners
- Running Genome Workbench over X Window System
NCBI GenBank Submissions Manuals
- Table of Contents
- Introduction
- Genome Submission Wizard
- Save Submission File
- Reports
- Import
- Sequences
- Add Features
- Add Publication
- Comments
- Editing Tools