genome

download a coronavirus genome dataset by taxon

Name

datasets download virus genome - download a coronavirus genome dataset by taxon

Description

Download a coronavirus genome dataset including genome, CDS and protein sequence, annotation and a detailed data report. Coronavirus genome datasets are limited to the Coronaviridae family including SARS-CoV-2. Coronavirus genome datasets can be specified by taxon. Datasets are downloaded as a zip file.

The default coronavirus genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • cds.fna (nucleotide coding sequences)
  • protein.faa (protein sequences)
  • data_report.jsonl (data report with viral metadata)
  • virus_dataset.md (README containing details on sequence file data content and other information)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s download and install documentation for information about getting started with the command-line tools.

Examples

  datasets download virus genome taxon sars-cov-2 --host dog
  datasets download virus genome taxon coronaviridae --host "manis javanica"

Options

      --annotated               limit to annotated coronavirus genomes
      --api-key string          NCBI Datasets API Key
      --complete-only           limit to complete coronavirus genomes
      --exclude-cds             exclude cds.fna (CDS sequence file)
      --exclude-protein         exclude protein.faa (protein sequence file)
      --exclude-seq             exclude genomic.fna (genomic sequence file)
      --filename string         specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
      --geo-location string     limit to coronavirus genomes isolated from a specified geographic location (continent, country or U.S. state)
  -h, --help                    help for genome
      --host string             limit to coronavirus genomes isolated from a specified host (NCBI Taxonomy ID, scientific or common name at any taxonomic rank)
      --lineage string          limit to SARS-CoV-2 genomes classified as the specified lineage (variant) by pangolin using the pangoLEARN algorithm
      --no-progressbar          hide progress bar
      --refseq                  limit to RefSeq coronavirus genomes
      --released-since string   limit to coronavirus genomes released after a specified date (MM/DD/YYYY)
      --updated-since string    limit to coronavirus genomes updated after a specified date (MM/DD/YYYY)

Commands


accession

Request genome data by accessions.

taxon

Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus

Generated December 6, 2022