accession
download a genome dataset by NCBI Assembly or BioProject accession
accession
Name
datasets download genome accession - download a genome dataset by NCBI Assembly or BioProject accession
Synopsis
datasets download genome accession <accession ...> [flags]
Description
Download a genome dataset by NCBI Assembly or BioProject accession. Genome data packages include genome, transcript and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.
The default genome dataset includes the following files (if available):
- genomic.fna (genomic sequences)
- rna.fna (transcript sequences)
- protein.faa (protein sequences)
- genomic.gff (genome annotation in gff3 format)
- data_report.jsonl (data report with genome assembly and annotation metadata)
- dataset_catalog.json (a list of files and file types included in the dataset)
Refer to NCBI’s download and install documentation for information about getting started with the command-line tools.
Examples
datasets download genome accession GCF_000001405.40 --chromosomes X,Y --exclude-gff3 --exclude-rna
datasets download genome accession GCA_003774525.2 GCA_000001635 --chromosomes X,Y,Un.9
datasets download genome accession PRJNA289059 --exclude-seq
Options
-a, --annotated only include genomes with annotation
--api-key string NCBI Datasets API Key
--assembly-level string restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold
--assembly-source string restrict assemblies to refseq or genbank only
--chromosomes strings limit to a specified, comma-delimited list of chromosomes (default [all])
--dehydrated download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
--exclude-genomic-cds exclude cds_from_genomic.fna (genomic cds file)
--exclude-gff3 exclude genomic.gff (gff3 annotation file)
--exclude-protein exclude protein.faa (protein sequence file)
--exclude-rna exclude rna.fna (transcript sequence file)
--exclude-seq exclude genomic.fna (genomic sequence file)
--filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
-h, --help help for accession
--include-gbff include genomic.gbff (GenBank flat file sequence and annotation), if available
--include-gtf include genomic.gtf (gtf annotation file), if available
--inputfile string read a list of NCBI Assembly accessions from a file to use as input
--no-progressbar hide progress bar
--reference limit to reference and representative (GCF_ and GCA_) assemblies
--released-before string only include genomes that have been released before a specified date (MM/DD/YYYY)
--released-since string only include genomes that have been released after a specified date (MM/DD/YYYY)
--search strings only include genomes that have the specified text in the
searchable fields: species and infraspecies, assembly name and submitter
To provide multiple strings '--search' can be included multiple times
Generated March 28, 2023