accession
Request genome data by accessions.
accession
Name
datasets download virus genome accession - Request genome data by accessions.
Synopsis
datasets download virus genome accession <accession ...> [flags]
Description
Download a coronavirus genome dataset by nucleotide accessions. Coronavirus genome data packages include genome, CDS and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.
The default coronavirus genome dataset includes the following files (if available):
- genomic.fna (genomic sequences)
- cds.fna (nucleotide coding sequences)
- protein.faa (protein sequences)
- data_report.jsonl (data report with viral metadata)
- virus_dataset.md (README containing details on sequence file data content and other information)
- dataset_catalog.json (a list of files and file types included in the dataset)
Refer to NCBI’s download and install documentation for information about getting started with the command-line tools.
Examples
datasets download virus genome accession NC_045512.2
Options
--annotated limit to annotated coronavirus genomes
--api-key string NCBI Datasets API Key
--complete-only limit to complete coronavirus genomes
--exclude-cds exclude cds.fna (CDS sequence file)
--exclude-protein exclude protein.faa (protein sequence file)
--exclude-seq exclude genomic.fna (genomic sequence file)
--filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
--geo-location string limit to coronavirus genomes isolated from a specified geographic location (continent, country or U.S. state)
-h, --help help for accession
--host string limit to coronavirus genomes isolated from a specified host (NCBI Taxonomy ID, scientific or common name at any taxonomic rank)
--input-file string read a list of nucleotide accessions from a text file - file should have 1 identifier per row and no spaces or quotes
--lineage string limit to SARS-CoV-2 genomes classified as the specified lineage (variant) by pangolin using the pangoLEARN algorithm
--no-progressbar hide progress bar
--refseq limit to RefSeq coronavirus genomes
--released-since string limit to coronavirus genomes released after a specified date (MM/DD/YYYY)
--updated-since string limit to coronavirus genomes updated after a specified date (MM/DD/YYYY)