NCBI Logo NCBI News NCBI News banner
National Center for Biotechnology Information US Department of Health and Human Services National Center for Biotechnology Information National Library of Medicine National Institutes of Health
Vol 14 No 1 of NCBI News

click to go to index of past issues

In this issue


Influenza Database and Tools

Trace Archives at 1 Billion

Entrez Nucleotide Split Database

Third Party Annotation Database

RefSeq Release 18

1918 Killer Flu Virus

UniGene

GenBank Release 155

Mammoths and Moas at NCBI

Recent NCBI Publications

NCBI Papers Most Cited

NCBI Courses

BLAST Lab

Genome Builds and Map Viewer


Masthead

 



New Databases and Tools Target Influenza

Influenza virus infection is a major threat to public health in the United States, resulting in over 200,000 hospitalizations and 30,000 deaths each year. The Influenza Virus Genome Project [1] is providing researchers with a growing collection of virus sequences essential to the identification of the genetic determinants of influenza pathogenicity. NCBI provides online tools for the analysis of these and other influenza sequences in GenBank that allow researchers to:

Retrieve—viral genomic, gene encoding, or protein sequences and download them in a number of formats

Align—locally stored sequences with those in NCBI databases

Cluster—sequences for phylogenetic analysis using a variety of algorithms and weight matrices, constructing dendrograms from the result

Download—complete genomic sequences

Search—influenza sequences using BLAST®

An Example

The analysis of the coding region (CDS) of the hemagglutinin ('HA'), sequence for influenza virus A, GenBank® accession AY653200, serves as an example of the use of these tools to classify a new sequence. Prior to the analysis, the CDS portion of the sequence was downloaded in FASTA format using NCBI's Entrez, and the FASTA definition line was changed from:

>gi-50365728:29-1735 Influenza A virus(/chicken/Jilin/9/2004(H5N1))segment 4, complete sequence

to read:

>local chicken

Selection of influenza sequences for analysis

To begin, use the Database link from the Influenza Virus Resource page at


to reach the Query Builder shown in Fig. 1.

click for larger image

Click on image to view larger

Figure 1. Query Builder for influenza sequences. Queries are built by making selections in three different sections of the form, labeled A, B, and C

Check the 'Coding region' radio button, indicated in section A, to specify the type of sequence to retrieve.

From the menus in section B, select 'Influenza A', 'Avian', 'Asia', and 'HA' as the 'Virus Species', 'Host', Country/Region', and 'Segment', respectively. In addition, check 'Full-length sequences only' and restrict the search to H5N1 subtype sequences from the year 2005 using the check boxes and text fields in section C. Clicking on 'Add to Query Builder' will return the number of sequences that match, as shown in section D. Click on 'Get sequences' to generate the form shown in Fig. 2, containing a table of summaries for the 85 selected sequences.

Influenza Figure 2

Click on image to view larger

Figure 2. Selection of sequences for further analysis. For brevity, only the first three of 85 selected entries is shown.

The table is sortable and the controls in section A have been used to sort the records by "Virus Name", after which 10 sequences from various hosts (3 goose, 1 quail, 2 duck, 2 chicken, 1 gull, 1 heron) have been selected for further analysis using the check boxes next to each entry-only the first two of the checked entries are visible in the figure. Using the button in section B, the FASTA sequence called "local chicken" has been uploaded, as indicated in section C.

Multiple sequence alignment

Click on 'Do multiple alignment' to align the "local chicken" sequence to the selected 85 database sequences using the multiple sequence alignment program MUSCLE [2], to generate the alignment shown in Fig. 3.

Influenza Figure 3

Click on image to view larger

Figure 3. Multiple sequence alignment for the "local chicken" HA sequences and 10 influenza HA coding sequences selected from the NCBI databases.

The portion of the alignment displayed, indicated in section A, begins near base 950 and ends near base 1040. Two major groups of sequences, characterized by non-synonymous base changes, sections B, one synonymous base change, section C, and a three-base deletion, section D, are evident.

Clustering and Phylogenetic analysis

Click on 'Build a Tree' to invoke the setup page for phylogenetic analysis where the sequences may be selected for inclusion in the subsequent analysis using check boxes. Click on 'Phylogenetic Analysis' to display the next page where a clustering algorithm may be selected, and the tree built. The resulting dendrogram is shown in Fig. 4.

Influenza Figure 4

Click on image to view larger

Figure 4 .Dendrogram built using the Local Search Neighbor Joining method.

The dendrogram shows two clusters, as might be anticipated on the basis of the alignment of Fig. 3. Two influenza sequences from a goose host and one from a gull host lie in the first of these clusters while three from a chicken host, including our "local chicken" sequence, two from a duck and one from a heron host are in the second cluster. An outlying sequence, branching from the base of the tree, came from a goose host in Mongolia. The dendrogram may be recomputed after adjusting several parameters. A 'non-linear' two dimensional dot plot (not shown) that groups sequences to provide an overview of a large dataset may also be generated.

Phylogenetic comparisons of this type have provided valuable insight into the process of genomic reassortments in influenza that lead to influenza outbreaks [3].

—TT

[1]Ghedin E, et al. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature. 2005 Oct 20;437(7062):1162-6. Epub 2005 Oct 5. PMID: 16208317.back to article

[2]Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. Print 2004. PMID: 15034147 back to article

[3]Holmes EC, et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 2005 Sep;3(9):e300. Epub 2005 Jul 26. PMID: 1602618 back to article

to next article


NCBI News | Summer 2003 NCBI News