protein
Download a SARS-CoV-2 protein dataset by protein name
protein
Name
datasets download virus protein - Download a SARS-CoV-2 protein dataset by protein name
Synopsis
datasets download virus protein <protein_name ...> [flags]
Description
Download a SARS-CoV-2 protein data package by protein name. SARS-CoV-2 protein
data packages include CDS and protein sequence, annotation and a detailed data report.
Datasets are downloaded as a zip file.
The default SARS-CoV-2 protein data package includes the following files:
cds.fna (nucleotide coding sequences)
protein.faa (protein sequences)
data_report.jsonl (data report with viral metadata)
dataset_catalog.json (a list of files and file types included in the data package)
Allowed protein names are: ORF1ab, ORF1a, nsp1, nsp2, nsp3, nsp4, nsp5, nsp6, nsp7, nsp8, nsp9, nsp10, rdrp, nsp11, nsp13, nsp14, nsp15, nsp16, S, ORF3a, E, M, ORF6, ORF7a, ORF7b, ORF8, N, ORF10
Examples
datasets download virus protein S --host dog --filename SARS2-spike-dog.zip
datasets download virus protein rdrp --refseq --filename SARS2-rdrp-refseq.zip
Options
--annotated Limit to annotated genomes
--api-key string Specify an NCBI API key
--complete-only Limit to complete genomes
--debug Emit debugging info
--filename string Specify a custom file name for the downloaded data package (default "ncbi_dataset.zip")
--geo-location string Limit to coronavirus genomes isolated from a specified geographic location (continent, country or U.S. state)
--help Print detailed help about a datasets command
--host string Limit to virus genomes isolated from a specified host species
--include string(,string) Specify virus genome sequence types to download
* cds: nucleotide coding sequences
* protein: amino acid sequences
* annotation: annotation report
* biosample: biosample report
* none: no sequence data, only primary data report
(default [protein])
--no-progressbar Hide progress bar
--refseq Limit to RefSeq coronavirus genomes
--released-after string Limit to coronavirus genomes released on or after a specified date (MM/DD/YYYY)
--updated-after string Limit to coronavirus genomes updated on or after a specified date (MM/DD/YYYY)
--version Print version of datasets
Generated April 16, 2024