download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)
datasets download genome taxon - download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)
datasets download genome taxon <taxon> [flags]
Download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank). Genome datasets include genome, transcript and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.
The default genome dataset includes the following files (if available):
Refer to NCBI’s command line quickstart documentation for information about getting started with the command-line tools.
datasets download genome taxon human --chromosomes 21 datasets download genome taxon "bos taurus" datasets download genome taxon 10116 --exclude-seq --exclude-gff3
-a, --annotated only include genomes with annotation --api-key string NCBI Datasets API Key --assembly-level string restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold --assembly-source string restrict assemblies to refseq or genbank only --chromosomes strings limit to a specified, comma-delimited list of chromosomes (default [all]) --dehydrated download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files). --exclude-genomic-cds exclude cds_from_genomic.fna (genomic cds file) --exclude-gff3 exclude genomic.gff (gff3 annotation file) --exclude-protein exclude protein.faa (protein sequence file) --exclude-rna exclude rna.fna (transcript sequence file) --exclude-seq exclude genomic.fna (genomic sequence file) --filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip") -h, --help help for taxon --include-gbff include genomic.gbff (GenBank flat file sequence and annotation), if available --include-gtf include genomic.gtf (gtf annotation file), if available --no-progressbar hide progress bar --reference limit to reference and representative (GCF_ and GCA_) assemblies --released-before string only include genomes that have been released before a specified date (MM/DD/YYYY) --released-since string only include genomes that have been released after a specified date (MM/DD/YYYY) --search strings only include genomes that have the specified text in the searchable fields: species and infraspecies, assembly name and submitter To provide multiple strings '--search' can be included multiple times --tax-exact-match exclude sub-species when a species-level taxon is specified