Searching in Genome Workbench

Introduction

The search functionality is available in the following views: Graphical Sequence View, Text View, Tree View, Generic Table View, Alignment Summary View, and VCF Table View. All these views have a search panel with a search box and a Find Mode dropdown list with a set of the search options available for view.

This manual addresses four types of searches:

Graphical Sequence View: exact match search by feature, SNP rsID, position or range

Open Genome Workbench and click File => Open. In the Open dialog select Data from GenBank, and type NC_00001.11 in the “Accession to load” box, click Next and Finish. A new project should be created and shown in the Project View. Right click on NC_00001.11 to open the context menu, select Open New View and open the molecule in the Graphical Sequence View (GSV). You should see an image similar to the one below. We configured the view using the Gear icon/Configure Tracks dialog to see only the Sequence track, Gene track(s), and Clinical SNP track for clear representation. You might have a different track configuration.

GSV NC 000001

The upper panel of the GSV has search controls (search box, binocular icon to start search, and the Find mode dropdown list which allows searching for a query with match and not match case letters). A query should not have blanks.

The search option in GSV allows you to search corresponding tracks for:

  • feature, for example by gene name (AMY1B, tRNA-Asn) or part of the name (AMY, t-RNA) or if there is no gene name, by gene LOC number (LOC100996442) Note: searches by protein or RNA accessions (like NM_, NP_, XP_, etc.) are not supported.
  • SNP rsID (rs201288184)
  • Range (10M-20M, 130909-150040) and position (123456)

Let’s type “AMY1” in the search box and hit the binocular icon. This search will return three genes – AMY1A, AMY1B and AMY1C. Click the binocular icon multiple times to cycle your view to each search result.

GSV gene search result

Note: The search will return the gene of interest for every Gene track that is configured in the view where the gene is present (even if the Gene track view is collapsed). When you reach the last matched gene, you will see the pop-up message:

GSV popup message

To perform a search by not matching case letters (amy1, Amy1) you need to select “Do not match case” in the Find Mode dropdown list.

Now let’s try a search for SNP track by ID (rs201288184). You will see the view zoomed to the sequence level with this SNP selected.

GSV SNP search

You can try to search other feature tracks on your own. Search will work, for example, for Biological regions tracks if search is done by a feature name (or part of the name) such as nucleotide_motif, or enhancer, etc.

Exact match search in the Text View

The exact match search is available in the Text View as well. Let’s select region 103554675-103760572 on NC_000001.11 (this region includes AMY gene cluster). To do it, mouse over the upper ruler, push the left mouse button and drag right or left. Then open the context menu (right click), choose Open New View => Text View, choose a molecule with the range you just selected and click Finish:

open location in TV

The Text View will open the selected region. Search options are similar to the ones in the Graphical Sequence View with the extra possibility to search by sequence. Search queries allow separated words without quotes (for example, alpha-amylase 1A precursor).

Select Sequence from the Find mode dropdown list, perform a search by tttgttgaaaaatctg and observe the sequence found.

TV search by sequence

Advanced Search in Tree View and Table Views

The search functionality in the Tree View and in the tables views (Generic Table View, Alignment Summary View and VCF Table View) is more advanced.

We will perform advanced searches in Generic Table View and Tree View using the tree in the ASN format as an example. File is located at https://ftp.ncbi.nlm.nih.gov/toolbox/gbench/samples/tree_midpoint_root.asn.

For convenience please save the file to your local drive.

Open the sample file

To load the test file to the Genome Workbench as a new project, click File => Open, in the Open dialog select File Import, and select the path to the sample file, click Next and Finish. A new project should be created and seen in the Project View. Open this project in the Tree View and in the Generic Table View (for Tree View we use the Layout: Rectangle Cladogram and Zoom Behavior: Vertical).

Search controls and search properties

In both views you can see search controls in the upper bar. The controls include the Search text box and the search values history dropdown button, the String Matching options dropdown list, the Start search button, Search in progress indicator, Stop search button, Search filter button, All checkbox, and Previous/Next selection arrows.

Search and filtering controls

String matching options include:

  • Exact match
  • Wildcards
  • Regular expression
  • Phonetic

There is also a Case sensitive checkbox.

TreeView search panel

Our sample tree has a lot of properties we can search the values of. The easiest way to see all properties and values that are available for searching is to look at the tree in the Generic table view. This table is sortable by columns. It is also possible to hide/unhide columns. Right clicking on the header of any column will open the list of all properties (column’s headers) that the tree has. They are all check marked by default. To hide a column, you need to remove the checkmark.

Generic Table properties

The properties and their values are also shown in the tooltips for every node of the tree. To open the tooltip you need to hover over the node of interest in the Tree View.

TreeView tooltip properties

Exact match search

Exact match search is a string search that searches all properties fields for a particular string. The query can be plain text or text with spaces (blanks) which represent separated words.

For example, queries: Typhimurium or “serovar Typhimurium” – both return the same result: 39 entities where the property field “scientific_name” has string with these worlds.
Salmonella bongori”, “GCF_000430145.1”, “Gapless Chromosome” – return correspondingly 2, 1, and 43 entities.

The above queries work similar to how a search works in Google – it searches all the fields in all the records of the data source and returns any entries where the above character string(s) appear. Note that, unlike Google, only one-character string may be provided.

The following queries that have data belonging to the different fields are not valid for this type of search: “Salmonella scaffold”, “GCA_000430145.1 Typhimurium”.

Let us perform a simple exact match search for “serovar Typhimurium”. All rows with the query string become selected (gray background). Scroll down to see selected rows.

GenericTableView exact match search not filtered

You can apply filtering to see only the rows with the feature of interest. Click on the Filter button (it becomes activated) and observe that now only 39 rows are shown.

GenericTableView exact match search filtered

While all 39 rows stay selected, let’s check broadcasting between Generic Table View and Tree View of the same project. Go to the Tree View you opened previously. Observe that one subbranch is selected. This selected subbranch include all Salmonella enterica/serovar Typhimurium that were found in the Generic Table View. Note: the same search can be performed in the Tree View window itself.

TreeView exact match search filtered

Now, adjust the Tree View window next to the Generic Table View window to see them both simultaneously. Remove the checkmark from the All checkbox in the Generic Table View and use Previous selection/Next selection buttons to jump between search result rows, selecting them one by one. Observe that every time, the corresponding subbranch is also selected in the Tree View (you might need to zoom in to see it clearly).

Generic Table TreeView broadcasting

Repeat a search for “serovar Typhimurium” in the Tree View. Implement filtering with the All checkbox selected and see that now the tree looks grayish with the compact branch selected. This branch represents serovar Typhimurium.

TreeView exact match filtered All

Now remove the checkmark from All checkbox and use the Next/Previous arrow buttons to perform filtering one by one. You will see an image similar the one below. Open the tooltip for the selected terminal node and see that the node represents serovar Typhimurium.

TreeView exact match filtered notAll

Wildcard search

Wildcards are used in search queries to represent one or more other characters. This search is useful when searching for data based on a specified pattern match. The two most usable wildcards are an asterisk (*) and a question mark (?).

  • (*) Matches zero or more non-space characters
  • (?) Matches exactly one non-space character.

Let’s search for query GCF_*.2. This search should return all accessions (used as the labels in our example tree) that have version 2. In the two images below, you can see the filtered result. In the Table View the search returned 13 rows with all labels having version 2 for accessions (GCF_*.2). In the Tree View all selected branches are branches with the same (GCF_*.2) accessions.

GenericTableView wildcard search filtered

TreeView wildcard search filtered

Regular expression search

Search based on multiple criteria and SQL-like syntax is also supported. Queries are made up of one or more connected true/false (Boolean) expressions. These expressions are built up from comparison operators that compare fields in the data source to other fields or values provided in the query. The below comparison operators will work with either character, numeric or Boolean (true/false) values:

  • = (equals)
  • < (less than)
  • > (greater than)
  • <= (less than or equal to)
  • >= (greater than or equal to)
  • Like (equals where ‘?’ matches any single character and ‘*’ matches zero or more characters)
  • Between (check if a value is between two other values)
  • In (check if a value is equal to any in a list of values)

Any comparison in the query returns a Boolean value. A query can combine comparisons into more complex expressions using the following Boolean operators, which only work with Boolean values:

  • AND (True only if both expressions are true)
  • OR (True if either or both expressions are true)
  • XOR (True if one expression is true and the other one is false)
  • NOT (True if the associated expression is false)

Note that the keywords Like, Between, In, AND, XOR and NOT are not case sensitive – they can be lower case, upper case, or a mixture of both. Also, expressions can be grouped together using parentheses to create different logical expressions, e.g.:

A and (B or C) is True if A is true and B or C is true; (A and B) or C is True if both A and B are true or C is true

Let us try a regular expression search: first, for the String matching option select Regular Expressions, then paste this query dist >0.1 AND scientific_name LIKE *ATCC* in the search box and hit the search button. The search returns 20 entries with ATCC numbers in their names and with distances to their parents longer than 0.1.

GenericTableView reg exp search not filtered

TreeView reg exp search not filtered

Here are a few more queries to try:

asm_level_txt = "Gapless Chromosome" OR asm_level_txt = Scaffold - returns 56 entries with Gapless Chromosome and Scaffold levels of assemblies.

scientific_name LIKE *Dublin* AND (asm_level_txt LIKE *Chromosome OR asm_level_txt = Scaffold) - returns 4 entries with serovar Dublin and Chromosome, Gapless Chromosome, and Scaffold levels of assemblies.

scientific_name LIKE *Dublin* AND asm_level_txt LIKE Chromosome – returns 1 entry with serovar Dublin and Chromosome level of assembly.

NOT assembly_method LIKE Ray* AND scientific_name LIKE *Bareilly* - returns 5 entries with serovar Bareilly and assembly methods not Ray.

More examples can be found in the Query Synatx manual.

Phonetic search

Phonetic search uses "Metaphone" - a phonetic algorithm for indexing words by their pronunciation. It creates approximate phonetic representation and is used to match words and names which sound similar. The Metaphone algorithm is useful when the text being searched has misspelled words.

Take the scientific name “Salmonella enterica subsp. enterica serovar Tennessee str. TXSC_TXSC08-19” as an example, searching for "Tennessee".

Phonetic search will match the following misspellings:

  • Tenese
  • Tennese
  • Tenesse
  • Tenesee

All Phonetic searches with the above misspelling queries will return three entities:

GenericTableView phonetic search filtered

TreeView phonetic search zoomed

Current Version is 3.6.0 (released March 04, 2021)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2021-03-10T19:12:13Z