Download SARS-CoV-2 protein sequence and annotation

Download sequence and annotation for selected SARS-CoV-2 proteins

Download SARS-CoV-2 protein sequence and annotation

Download sequence and annotation for selected SARS-CoV-2 proteins

Retrieve genome, protein sequence, and annotation for selected SARS-CoV-2 proteins through the easy-to-use website, command line tool or programming languages. The selected proteins can be filtered by their completeness, host organism, and geographic location.

For an overview of the downloaded package contents, see the NCBI Datasets SARS-CoV-2 Data Package description.

Download Selected Proteins

Download all Spike protein sequences for samples collected from human hosts

  1. Visit the Datasets SARS-CoV-2 protein page .
  2. Click S in the viral genome cartoon.
  3. Click Download.
  4. Name your file and select the checkboxes for the sequence files you want to download.
  5. Click Download.
datasets download virus protein S --host human --complete-only --exclude-gpff --exclude-pdb

For more information, see the Datasets Python API reference documentation

In this example, we are using the sars2_protein_download method from ncbi-datasets-pylib.

from ncbi.datasets.openapi import ApiClient as DatasetsApiClient
from ncbi.datasets.openapi import ApiException as DatasetsApiException
from ncbi.datasets.openapi import VirusApi as DatasetsVirusApi

zipfile_name = "sars_cov2_protein_dataset.zip"


with DatasetsApiClient() as api_client:
    virus_api = DatasetsVirusApi(api_client)
    try:
        print("Begin download of virus protein data package ...")
        virus_protein_ds_download = virus_api.sars2_protein_download(
            "SPIKE",
            complete_only=True,
            host="human",
            include_annotation_type=["PROT_FASTA", "CDS_FASTA"],
            _preload_content=False,
        )

        with open(zipfile_name, "wb") as f:
            f.write(virus_protein_ds_download.data)
        print(f"Download completed -- see {zipfile_name}")
    except DatasetsApiException as e:
        print(f"Exception when calling sars2_protein_download: {e}\n")
  Download support for the R language is not yet available. Please check back for updates!
Generated October 22, 2021