3D Macromolecular Structures
How to align a query protein to a similar sequence from a 3D structure
and interactively view sequence/structure relationships


The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. Even if a 3D structure for your protein of interest has not yet been resolved, it is possible to align your query protein to a similar sequence from a 3D structure, then interactively view the sequence/structure relationships, as shown in the illustration. Use method 1 if the protein of interest already has a sequence record in the Entrez Protein database, and use method 2 if the protein sequence is not yet in that database. The additional notes near the bottom of the page provide step-by-step instructions on how to generate the view shown on this page and how to identify putative active site residues.


Method 1: If your protein of interest already has a sequence record in the Entrez Protein database:

  1. Open the Entrez Protein search page

  2. Retrieve your sequence record of interest, for example, human prostaglandin-endoperoxide synthase 1 isoform 1 precursor (accession NP_000953.2, gi 18104967), which is a product of the human PTGS1 gene (GeneID 5742).

  3. Scroll down to the "Related Information" menu in the right margin of the protein sequence record display and select "Related Structures (Summary)." This will retrieve protein sequences that have experimentally resolved 3D structures, and that were found by the Related Structures ("CBLAST") service to be similar to your query sequence. (The Molecular Modeling Database (MMDB) help document provides more information about the "Related Structures" links that are accessible from the right margin of protein sequence record displays.)

    The graphical summary displayed by CBLAST shows the conserved domains and conserved features/sites that were found on the protein query sequence, alignment footprints of the related structures relative to the query, and links that allow you to display the 3D structure and sequence alignment in Cn3D.

    As an example, view the Related Structures display for GI 18104967, human prostaglandin-endoperoxide synthase.

  4. A low redundancy list of hits is shown by default. If desired, change the View menu option to All Similar MMDB in order to see the complete list of hits. Click on the thumbnail image of any structure to read more about it. (The 1PTH structure shown in the illustration is on the third page of hits for gi 18104967; see additional notes, below.)

  5. To view a sequence alignment of the query and a hit of interest, click on the pink bar that represents the alignment footprint in the graphic display of hits, or on the "view alignment" link in the last column of the table display.

  6. On the sequence alignment display, press the View Structure and Alignment in Cn3D button to open an interactive view of the sequence alignment and corresponding 3D structure. The Cn3D program must be present on your computer in order for the button to work; the program is free and takes only a few minutes to install.

  7. Once the Cn3D display is open, you can click on any amino acids from the retrieved structure, in either the 3D structure or the sequence alignment window, to highlight their location in both views and examine the sequence/structure relationship. The Cn3D Tutorial provides more information about using the program, and the third comment under additional notes, below, provides step-by-step instructions on how to generate the specific view shown in the illustration on this page.

Method 2: If your sequence of interest is not yet available in the public database:

  1. If you have a protein query sequence, open the Protein BLAST (blastp) page and Choose Search Set: Protein Data Bank proteins (pdb).

    If you have a nucleotide query sequence and want to compare its translation against protein sequences from resolved 3D structures, open the Translated BLAST (blastx) page and Choose Search Set: Protein Data Bank proteins (pdb)

  2. Enter your query sequence in FASTA format. If desired, adjust the algorithm parameters to make the search more or less stringent than the default, then press the blue BLAST button at the bottom of the query page to execute the search.

  3. The Protein BLAST search results provide access to Related Structures by displaying a "Structure" link in the right margin of a pairwise sequence alignment on a protein BLAST results page, if a BLAST hit is from the Protein Data Bank (PDB). Click on the "Structure" link beside any hit of interest to view that hit aligned to your query sequence on the CBLAST display. From there, you can follow steps 5+ in Method 1, above, to open an interactive view of the query sequence and the structure in Cn3D.
  4. (For example: open the protein BLAST results page for GI 18104967, human prostaglandin-endoperoxide synthase, against the Protein Data Bank proteins (pdb). Notice the "Structure" link in the right margin of each pairwise alignment. Click on the "Structure" link to view the CBLAST display for GI 18104967.)

Additional Notes:

  1. A query protein sequence can potentially retrieve many similar 3D-structure-based sequences. Some of the structures might be free proteins, other might be bound to another molecule such as a chemical or other protein. The salient features of each structure are described in the publication(s) cited on its MMDB summary page, accessible by clicking on a structure's thumbnail image.

  2. The 1PTH structure shown in the illustration is on the third page of "Related Structure" hits for protein sequence gi 18104967, as of 20 January 2010 (if "View all similar MMDB sequences" is selected in the "Related Structures" display, rather than the default "Low redundancy" option). 1PTH is featured here because it shows a chemical, salicylic acid, blocking the channel that leads to the protein's active site. Though it is a ovine protein, it is homologous to the human protein and can therefore be helpful in elucidating the biological function of the human protein.

  3. Step by step instructions: The specific view shown in the illustration was generated by following steps 1-6 of method 1, then: (a) zooming into protein's active site region in the 3D structure window; (b) double-clicking on the salicylic acid to highlight it; (c) selecting Show/Hide -> Select by distance -> Residues Only -> 5 Angstroms to reveal the amino acids that are within that distance of the highlighted chemical (i.e., to identify putative active site residues), and (d) panning across the sequence data to see which amino acids were highlighted in the sequence data.

See Also:

(PDB accession 1PTH; MMDB ID 50885)
Image of a human query protein sequence aligned to a homologous sheep sequence that has a resolved 3D structure, viewed in the free Cn3D software program.  Click anywhere on the image for more information about the structure and for options to view it in Cn3D, where you can interactively examine sequence-structure relationships.
  In the example shown here, the human PTGS1 gene product is aligned to the sheep homolog, which has a known sequence and a resolved 3D structure. Salicylic acid is blocking the channel to the active site, where a heme cofactor is also shown. As a result, the protein is no longer able to convert its substrate, prostaglandin G2, to its prostaglandin H2 product, thereby reducing pain and inflammation. This inferred structural basis of aspirin activity is described in corresponding publications, which are accessible as links from the structure record.  
Revised 23 September 2016