Download a genome data package

Download an NCBI Datasets genome data package using the NCBI Datasets command-line tools

Download a genome data package

Download an NCBI Datasets genome data package using the NCBI Datasets command-line tools

Download an NCBI Datasets genome data package, including sequences, annotation and detailed data report.

Genome data packages can be downloaded by NCBI Taxonomy ID or taxonomic name, NCBI Assembly accession, or NCBI BioProject accession.

This How-to guide works best for smaller downloads (< 5 animal genomes or < 500 prokaryote genomes). For larger downloads, try our How-to for large downloads.

Using a taxonomic name

Run the following command to download all human genomes by taxon name:

datasets download genome taxon human --filename human_dataset.zip

Using an Assembly accession

Run the following command to download the human reference genome, GRCh38, by NCBI Assembly accession:

datasets download genome accession GCF_000001405.40 --filename human_GRCh38_dataset.zip

Using BioProject accession

Get data for assembled genomes belonging to an NCBI BioProject, for example, the Sanger 25 Genomes Project, PRJEB33226.
datasets download genome accession PRJEB33226 --filename sanger_bioproject_dataset.zip

Choosing which data files to include in the data package

Genome data packages contain genome sequences and metadata by default. You can also choose to add additional data files or only include metadata in the data package using --include with one or more terms. For a full list of available data files, see the datasets reference. Here are a few examples of using the --include flag to choose which data files to include in the data package.

Get genome and protein sequences for the human reference genome:

datasets download genome taxon human --reference --include genome,protein

Get genome, transcript, CDS and protein sequences for the human reference genome:

datasets download genome taxon human --reference --include genome,rna,cds,protein

Get a data package with only the genome assembly data report (metadata):

datasets download genome taxon human --reference --include none

Filtering by genome assembly properties

When downloading a genome data package by either taxon, Assembly or BioProject accession, you can filter the results by different genome assembly properties, including the following:

  • reference status
  • annotation status
  • assembly level
  • year released
  • infraspecies name
  • assembly name
  • submitter name

 

Get data for the human reference genome:

datasets download genome taxon human --reference
Get data for annotated human genomes:
datasets download genome taxon human --annotated
Get data for human genomes with the Assembly level of "complete genome" (all chromosomes are gapless):
datasets download genome taxon human --assembly-level complete
Get data for human genomes released after January 1, 2020:
datasets download genome taxon human --released-after 01/01/2020
Get data for human genomes submitted by the T2T Consortium:
datasets download genome taxon human --search 'T2T Consortium'
Generated April 19, 2024