Download SARS-CoV-2 genomes

Download sequence and annotation for SARS-CoV-2 GenBank genomes by taxon or lineage

Download SARS-CoV-2 genomes

Download sequence and annotation for SARS-CoV-2 GenBank genomes by taxon or lineage

Get genome, protein sequence, and annotation for SARS-CoV-2 GenBank genomes through the easy-to-use website, command line tool or programming languages. The selected genomes can be filtered by their completeness, host organism, and geographic location.

For an overview of the downloaded package contents, see the NCBI Datasets SARS-CoV-2 Data Package description.

Download by SARS-CoV-2 lineage

Download SARS-CoV-2 GenBank genomes for specific lineages as classified by pangolin

  1. Visit the NCBI homepage .
  2. Enter the name of a SARS-CoV-2 lineage into the search box at the top of the page, for example P.1.
  3. Click Search.
  4. In the Virus Classification box, click Download data package.
  5. Click Download.
datasets download virus genome taxon SARS2 --lineage P.1 --filename SARS-CoV-2-P.1.zip

To get started with the Python library, see the Datasets Python API reference documentation.

First download the data package for the selected virus taxon using the virus_genome_download method from ncbi-datasets-pylib. This function takes both the taxonomy name and an optional pangolin lineage. Then, to access the results, open the zip file and print the catalog to show all of the included files using the VirusDataset class in ncbi.datasets.package.dataset.

from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets.openapi import VirusApi as DatasetsVirusApi

from ncbi.datasets.package import dataset

zipfile_name = "sars_cov2_dataset.zip"
pangolin_classification = "B.1.427"

with DatasetsApiClient() as api_client:
    virus_api = DatasetsVirusApi(api_client)
    try:
        print("Begin download of virus data package ...")
        virus_ds_download = virus_api.virus_genome_download(
            "SARS2",
            complete_only=True,
            include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
            pangolin_classification=pangolin_classification,
            _preload_content=False,
        )

        with open(zipfile_name, "wb") as f:
            f.write(virus_ds_download.data)
        print(f"Download completed -- see {zipfile_name}")
    except DatasetsApiException as e:
        print(f"Exception when calling virus_genome_download: {e}\n")

# open the package zip archive so we can retrieve files from it
package = dataset.VirusDataset(zipfile_name)
# print the names and types of all files in the downloaded zip file
print(package.get_catalog())

Download by taxon

Download all SARS-CoV-2 GenBank genomes for samples collected from human hosts

  1. Visit the Datasets Coronavirus Genomes page .
  2. Find the SARS-CoV-2 row in the Taxonomy table.
  3. Click Dataset in the Download column.
  4. Name your file and select the checkboxes for the sequence files you want to download.
  5. Click Download.
datasets download virus genome taxon SARS2 --host human --complete-only

To get started with the Python library, see the Datasets Python API reference documentation.

First download the data package for the selected virus taxon using the virus_genome_download method from ncbi-datasets-pylib. Then, to access the results, open the zip file and print the catalog to show all of the included files using the VirusDataset class in ncbi.datasets.package.dataset.

from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets.openapi import VirusApi as DatasetsVirusApi

from ncbi.datasets.package import dataset

zipfile_name = "sars_cov2_dataset.zip"

with DatasetsApiClient() as api_client:
    virus_api = DatasetsVirusApi(api_client)
    try:
        print("Begin download of virus data package ...")
        virus_ds_download = virus_api.virus_genome_download(
            "SARS2",
            complete_only=True,
            host="human",
            include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
            _preload_content=False,
        )

        with open(zipfile_name, "wb") as f:
            f.write(virus_ds_download.data)
        print(f"Download completed -- see {zipfile_name}")
    except DatasetsApiException as e:
        print(f"Exception when calling virus_genome_download: {e}\n")

# open the package zip archive so we can retrieve files from it
package = dataset.VirusDataset(zipfile_name)
# print the names and types of all files in the downloaded zip file
print(package.get_catalog())
  Download support for the R language is not yet available. Please check back for updates!
Generated October 22, 2021