protein

download a SARS-CoV-2 protein dataset by protein name

protein

download a SARS-CoV-2 protein dataset by protein name

Name

datasets download virus protein - download a SARS-CoV-2 protein dataset by protein name

Synopsis

datasets download virus protein <protein_name ...> [flags]

Description

Download a SARS-CoV-2 protein dataset by protein name. SARS-CoV-2 protein datasets include CDS and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.

The default SARS-CoV-2 protein dataset includes the following files:

  • cds.fna (nucleotide coding sequences)
  • protein.faa (protein sequences)
  • protein.gpff (protein sequence and annotation in GenPept flat file format)
  • protein structures in PDB format
  • data_report.jsonl (data report with viral metadata)
  • virus_dataset.md (README containing details on sequence file data content and other information)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.

Allowed protein names:

  • ORF1ab
  • ORF1a
  • nsp1
  • nsp2
  • nsp3
  • nsp4
  • nsp5
  • nsp6
  • nsp7
  • nsp8
  • nsp9
  • nsp10
  • rdrp
  • nsp11
  • nsp13
  • nsp14
  • nsp15
  • nsp16
  • S
  • ORF3a
  • E
  • M
  • ORF6
  • ORF7a
  • ORF7b
  • ORF8
  • N
  • ORF10

Examples

  datasets download virus protein S --host dog --filename SARS2-spike-dog.zip
  datasets download virus protein S E M N --refseq --filename SARS2-structural-refseq.zip

Options

      --annotated               limit to annotated coronavirus genomes
      --api-key string          NCBI Datasets API Key
      --complete-only           limit to complete coronavirus genomes
      --exclude-cds             exclude cds.fna (CDS sequence file)
      --exclude-gpff            exclude protein.gpff (protein sequence and annotation in GenPept flat file format
      --exclude-pdb             exclude *.pdb (protein structure files)
      --exclude-protein         exclude protein.faa (protein sequence file)
      --filename string         specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
      --geo-location string     limit to coronavirus genomes isolated from a specified geographic location (continent, country or U.S. state)
  -h, --help                    help for protein
      --host string             limit to coronavirus genomes isolated from a specified host (NCBI Taxonomy ID, scientific or common name at any taxonomic rank)
      --no-progressbar          hide progress bar
      --refseq                  limit to RefSeq coronavirus genomes
      --released-since string   limit to coronavirus genomes released after a specified date (MM/DD/YYYY)
Generated October 22, 2021