taxon

Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus

taxon

Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus

Name

datasets download virus genome taxon - Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus

Synopsis

datasets download virus genome taxon <taxon> [flags]

Description

Download a coronavirus genome dataset by taxon (NCBI Taxonomy ID, scientific or common name for any taxonomic group in the coronavirus family). Coronavirus genome datasets include genome, CDS and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.

The default coronavirus genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • cds.fna (nucleotide coding sequences)
  • protein.faa (protein sequences)
  • protein.gpff (protein sequence and annotation in GenPept flat file format)
  • protein structures in PDB format
  • data_report.jsonl (data report with viral metadata)
  • virus_dataset.md (README containing details on sequence file data content and other information)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.

Examples

  datasets download virus genome taxon sars-cov-2 --host dog
  datasets download virus genome taxon coronaviridae --host "manis javanica"

Options

      --annotated               limit to annotated coronavirus genomes
      --api-key string          NCBI Datasets API Key
      --complete-only           limit to complete coronavirus genomes
      --exclude-cds             exclude cds.fna (CDS sequence file)
      --exclude-gpff            exclude protein.gpff (protein sequence and annotation in GenPept flat file format
      --exclude-pdb             exclude *.pdb (protein structure files)
      --exclude-protein         exclude protein.faa (protein sequence file)
      --exclude-seq             exclude genomic.fna (genomic sequence file)
      --filename string         specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
      --geo-location string     limit to coronavirus genomes isolated from a specified geographic location (continent, country or U.S. state)
  -h, --help                    help for taxon
      --host string             limit to coronavirus genomes isolated from a specified host (NCBI Taxonomy ID, scientific or common name at any taxonomic rank)
      --include-gbff            include genomic.gbff (genome sequence and annotation in GenBank flat file format)
      --lineage string          limit to SARS-CoV-2 genomes classified as the specified lineage (variant) by pangolin using the pangoLEARN algorithm
      --no-progressbar          hide progress bar
      --refseq                  limit to RefSeq coronavirus genomes
      --released-since string   limit to coronavirus genomes released after a specified date (MM/DD/YYYY)
Generated October 18, 2021