Download a genome data package
Download an NCBI Datasets genome data package using the NCBI Datasets command-line tools
Download a genome data package
Download an NCBI Datasets genome data package, including sequences, annotation and detailed data report.
Genome data packages can be downloaded by NCBI Taxonomy ID or taxonomic name, NCBI Assembly accession, or NCBI BioProject accession.
This How-to guide works best for smaller downloads (< 5 animal genomes or < 500 prokaryote genomes). For larger downloads, try our How-to for large downloads.
Using a taxonomic name
Run the following command to download all human genomes by taxon name:
datasets download genome taxon human --filename human_dataset.zip
Using an Assembly accession
Run the following command to download the human reference genome, GRCh38, by NCBI Assembly accession:
datasets download genome accession GCF_000001405.40 --filename human_GRCh38_dataset.zip
Using BioProject accession
Get data for assembled genomes belonging to an NCBI BioProject, for example, the Sanger 25 Genomes Project, PRJEB33226.datasets download genome accession PRJEB33226 --filename sanger_bioproject_dataset.zip
Choosing which data files to include in the data package
Genome data packages contain genome sequences and metadata by default. You can also choose to add additional data files or only include metadata in the data package using --include
with one or more terms. For a full list of available data files, see the datasets reference.
Here are a few examples of using the --include
flag to choose which data files to include in the data package.
Get genome and protein sequences for the human reference genome:
datasets download genome taxon human --reference --include genome,protein
Get genome, transcript, CDS and protein sequences for the human reference genome:
datasets download genome taxon human --reference --include genome,rna,cds,protein
Get a data package with only the genome assembly data report (metadata):
datasets download genome taxon human --reference --include none
Filtering by genome assembly properties
When downloading a genome data package by either taxon, Assembly or BioProject accession, you can filter the results by different genome assembly properties, including the following:
- reference status
- annotation status
- assembly level
- year released
- infraspecies name
- assembly name
- submitter name
Get data for the human reference genome:
datasets download genome taxon human --reference
datasets download genome taxon human --annotated
datasets download genome taxon human --assembly-level complete
datasets download genome taxon human --released-after 01/01/2020
datasets download genome taxon human --search 'T2T Consortium'