Sequences

Add Sequences

The Add Sequences function can be used to add new sequences to an existing project already in progress. For example, the nuclear sequences of a bacterial genome already exist in a project, but the plasmid sequences weren’t included. Use Add Sequences to add the plasmids to the project.

Sequences Open Objects Menu

The default file format setting is fasta sequence files. This can be changed to NCBI ASN.1 files or fasta alignment files. The browse function can be used to find files to open or the files can simply be dropped in and recently used files will appear in the Recently used Files box for easy use.

If the imported file has the same sequence identifier as one of the existing files, an error will occur:

Sequences Error

Select OK and a dialog box will open:

Sequences Id Problems Dialog

The top portion of the window displays all Sequence IDs from the existing sequence. The bottom displays the new Sequence IDs. In the new Sequence ID display of the example, it indicates that the problem is a duplicate ID. Simply type in the new ID in the New Sequence ID box and click the recheck problems button.

Sequences Id Problems Corrected

When everything is correct the Problems column (The last column) will be empty. At this point click the accept button. The new sequences will be added to the end of the file. At any point to end the process without making changes simply click the cancel button.

Edit Sequence

Edit Sequence 1

The Edit Sequence dialog is a useful tool for viewing the sequence and features and editing the sequence content and feature locations.

Edit Sequence 2

The Show menu controls the information displayed. For example, the user could choose to view reading frames or display the sequence complement below the sequence. The user could also choose to display features as labeled lines below the sequence or hide them. For coding regions, the On-the-fly option shows the protein translation calculated using the sequence underlying the coding region and the frame of the coding region. The protein currently associated with the coding region is also displayed, and the user can choose Mismatch to highlight the positions on the protein sequence that do not match the calculated translation.

Edit Sequence 3

The Edit->Find menu item launches a dialog that allows the user to search for sequence characters in the nucleotide sequence, the reverse complement, or in translated frames. The “Go to:” text box enables the user to select the position of the cursor. This is useful for navigating to a specific position without scrolling. This control will also allow the user to search for sequence characters – for example, searching for “atg” will move the cursor to the next instance of this codon. The “Select:” text box enables a user to select a range of sequence without using the mouse to drag the cursor. Selecting a sequence range is useful because the Annotate menu can be used to create a feature for the selected location. The user can edit the sequence directly by clicking on the sequence and typing characters or using backspace or delete. When features are displayed, their locations can be adjusted by dragging the endpoints of their intervals. Note that when making changes in the Sequence Editing dialog, it is necessary to use the Commit button to apply changes to the data before adding new features with the Features menu or retranslating coding regions with the Retranslate button. The Cancel button will exit the dialog and undo any edits that have been made by the dialog.

Update Sequence

Update Sequence allows the user to upload a new sequence to replace an existing sequence and make appropriate changes to existing features. The dialog updates a single sequence at a time.

When Update Sequence is chosen, your computer’s open/choose file dialog is opened. Choose the correct FASTA or ASN.1 file from the list of available files.

When the updated sequence file is successfully read, the Update Sequence dialog opens and displays four sections: Alignment, Sequence Update, Existing Features, and Options.

GWB Sequences Update Sequence dialog

  • Alignment describes and displays the existing sequence and updated sequence, showing differences in various colors.

  • Sequence Update indicates how the existing sequence will be changed and chooses Replace by default, which replaces the entire existing sequence with new sequence. It also allows:

    • No Change (Useful when the user only wants to import features from the update sequence, and not change the nucleotide content of the existing sequence.);
    • Patch Aligned Region (Replace only the aligned portion of the sequence)
    • Extend 5’ (Adds new sequence to the 5' end of the existing sequence but is disabled when the "Ignore alignment" box is unchecked and the alignment between the existing and update sequence does not indicate that the update sequence contains nucleotides that should be added to the 5' end.)
    • Extend 3’ (Adds new sequence to the 3' end of the existing sequence but is disabled when the "Ignore alignment" box is unchecked and the alignment between the existing and update sequence does not indicate that the update sequence contains nucleotides that should be added to the 3' end.)
    • Ignore Alignment (Used when there is no good alignment between the existing and update sequences. When this box is checked, the Patch and No change options are disabled and the Replace, Extend 5', and Extend 3' options are enabled.)
  • Existing Features chooses Do Not Remove by default, but also allows Remove Inside Aligned Area, Remove Outside Aligned Area, and Remove All as options for treating features on the existing sequence.

  • Options includes the choices to Import All Features From Update Sequence or to choose that Annotation is Copied From an Earlier Version of the Same Nucleotide Accession

The Update Sequence button will apply all choices made to the existing sequence and features and close the dialog.

The Cancel button will close the dialog and make no changes to the existing sequence and features.

The new sequence and any other related changes will be displayed in the Flat File view.

Remove Sequences

Remove Sequences opens a dialog that allows you to remove one or more sequences from the current group of sequences. All non-chosen/non-removed sequences remain in the current group and can continue to be processed.

GWB Sequences Remove Sequences Dialog

In the open dialog, the left window at the top lists the sequences in the current group by Filename/SequenceID SequenceID (sequence length). For example, a 4,321 nucleotide sequence with SequenceID of SEQ1 would appear: Filename1/SEQ1 SEQ1 (4321)

By clicking on a sequence listed in the left window to highlight it and then clicking on the the [>>>] choice, you can move the chosen sequence to the right window. Clicking [Accept] at the bottom of the dialog will now remove the chosen sequence(s) from the current group.

GWB Sequences Remove Sequences Dialog Choose Seq

GWB Sequences Remove Sequences Dialog Remove Seq

If a sequence was incorrectly chosen and should be moved back to the left window to stay in the current group, highlight the sequence in the right window and click the [<<<] choice to move the sequence back.

Multiple sequences can be chosen and removed from the current group by using the options below the [<<<] and [>>>] choices:

  1. One or more SequenceIDs that include/do not include specific text can be chosen by using the ‘Seq-id’ pull down menu: Is one of/Contains/Does not Contain/Equals/etc and by entering the desired SequenceID text in the free text box. Upper and lower cases and spaces can also be used/ignored to specify desired SequenceIDs by using those choices. The [Clear Constraint] choice removes all choices from this option.

  2. Sequences of certain lengths can be chosen by using the next two options: ‘Select sequences longer than’ and/or ‘Select sequences less than’ and entering a sequence length as number of nucleotides in the corresponding boxes.

  3. The buttons at the bottom of the dialog function as described:

    • Select - Moves the sequences identified from the current group to the list to be removed from the current group
    • Select All - Moves all of the sequences in the current group to the list to be removed from the current group
    • Unselect All – Moves all of the sequences listed to be removed back to the current group.
    • Accept – Confirms the choices of sequences to be removed from the current group, removes them, and closes the dialog.
    • Cancel – Closes the dialog without taking any action.

Reverse Complement Sequences by Sequence ID

The orientation of an individual contig, plasmid or chromosome does not matter to GenBank. Submitters however may prefer sequences be in a particular orientation for example so that all contigs are on the plus strand or so that certain genes are first in the genome. If it is determined that it is necessary to reverse complement a sequence, that can be done with the Reverse Complement by SeqID task.

Sequences Reverse Complement SequencesDialog

The sequence or sequences to be reverse complemented can be selected in the top window. Hold down the Ctrl button when selecting multiple sequences. Alternatively, sequence IDs can be entered in the constraint box to select specific sequences. For example, all sequences with sequence IDs that contains F will select those three sequences in this demonstration record. Once the constraint has been entered, hit the select button to mark those records to be changed. If all sequences should be reverse complemented, simply hit the select all button to highlight all sequences. If something was selected incorrectly, simply hit the unselect all button and start over.

When the list to change is correct, hit accept to perform the action. This will create a nucleotide sequence that is reverse complemented but not make any other changes. If there are features on the sequence and the Reverse features is checked, they can follow the reverse complement sequence action so the last gene will now be first, etc. If the features and sequence are on opposite strands, only one of the two boxes will need to be checked. In all cases, the sequence should be checked after to ensure that the changes were incorporated.

Trim Terminal Ns

Trim Terminal Ns removes one or more unknown nucleotides (represented in the sequence as ‘n’) from the 5’ or 3’ ends of all sequences in the current group.

Terminal Ns are sometimes added by tools that create an alignment of multiple sequences to fill gaps at the ends of sequences when all of the sequences are not the same length. However, GenBank prefers not to include terminal Ns in sequences as they are considered ambiguous data that add no additional value to the sequence.

NNNNNNNNNNNNTGCGGGATTATTCATACCGTCCAACCATCGGGCGTACCTATGTGTACGACAATAAATTGGGTTGTGTTATCAAAAACGCCAAGCGCAAGAAGCACCTAGTCGA …

Using Trim Terminal Ns will remove the 5’ terminal Ns and result in:

TGCGGGATTATTCATACCGTCCAACCATCGGGCGTACCTATGTGTACGACAATAAATTGGGTTGTGTTATCAAAAACGCCAAGCGCAAGAAGCACCTAGTCGA…

When Trim Terminal Ns is used to remove 5’ or 3’ n’s, a pop-up dialog will report the sequence(s) and number of n’s removed.

GWB Sequences Trim Terminal Ns Results

If terminal Ns are not removed from a sequence to be submitted, they will be removed when the sequence is processed by GenBank.

Expand Known Gaps to Include Flanking Ns

After submitting a file which contains gaps, contamination may often be detected by the foreign contamination screen near a gap boundary. In order to keep the coordinate system the same, the contamination will often be replaced by N’s in the sequence. However, these N’s now need to be incorporated into the neighboring gap. Rather than removing all gaps and adding them back, simply use the Expand Known Gaps to Include Flanking N’s button to adjust the gaps in the submission file to include this additional sequence when the gap is of estimated length. Always check to make sure that the estimated length of the gap has changed accordingly. Note that this task only works when the gap is estimated length. It cannot be used when the gaps are unknown length.

Add Linkage Evidence to All Gaps

WORK IN PROGRESS

Add Assembly Gaps to Sequence

Add Assembly Gaps to Sequence opens a dialog that allows the identification and description of gaps in the current sequence.

GWB Sequences Add Assembly Gaps To Sequence

What is a sequencing gap? A section of unknown sequence between sections of known sequence. The unknown sequence can be known (estimated) length based on alignment or other biology or it can be unknown length.

All unknown length gaps in a sequence should represented by a string of exactly 100 internal N’s. The first choice in the dialog defaults to this description and will convert those gaps when Accept is clicked.

GWB Sequences Add Assembly Gaps To Sequence Unknown

All gaps (internal strings of N’s) of 101 N’s or longer will be converted to gaps of known length equal to the number of N’s. This is also a default setting.

GWB Sequences Add Assembly Gaps To Sequence Known

Also by default, CDSs with intervals that include gaps will be adjusted when the gaps are converted.

GWB Sequences Add Assembly Gaps To Sequence Adjust CDS

Add linkage information to gaps allows Gap Type, Linkage and Linkage evidence to be set. Choose the appropriate Type from the pull-down menu and the corresponding Linkage and Evidence when required.

GWB Sequences Add Assembly Gaps To Sequence Linkage Choices

Choose Accept to apply the gaps that have been described or Cancel to close the dialog without any action.

Remove Gap Features

Remove Gap Features removes all of the assembly gap features in the submission file. If gaps were included in the submission file using Add Assembly Gaps to Sequence in Genome Workbench, or using the programs table2asn_gff or tbl2asn, this function can be used to remove them. Note that this does not change the underlying sequence and only reverts the sequence to its pre-gapped state. There will be errors regarding the presence of the N's if assembly gap features are not added back. Gaps can be added back using Add Assembly Gaps to Sequence with new settings.

For more information please see the full documentation for NCBI Genome Workbench Editing Package.

Current Version is 3.7.1 (released October 13, 2021)

Release Notes

Downloads

General


Help


Tutorials


General use Manuals


NCBI GenBank Submissions Manuals


Other Resources


Support Center

Last updated: 2019-09-25T15:43:55Z