accession

download a genome dataset by NCBI Assembly or BioProject accession

accession

download a genome dataset by NCBI Assembly or BioProject accession

Name

datasets download genome accession - download a genome dataset by NCBI Assembly or BioProject accession

Synopsis

datasets download genome accession <accession ...> [flags]

Description

Download a genome dataset by NCBI Assembly or BioProject accession. Genome data packages include genome, transcript and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.

The default genome dataset includes the following files (if available):

  • genomic.fna (genomic sequences)
  • rna.fna (transcript sequences)
  • protein.faa (protein sequences)
  • genomic.gff (genome annotation in gff3 format)
  • data_report.jsonl (data report with genome assembly and annotation metadata)
  • dataset_catalog.json (a list of files and file types included in the dataset)

Refer to NCBI’s download and install documentation for information about getting started with the command-line tools.

Examples

  datasets download genome accession GCF_000001405.40 --chromosomes X,Y --exclude-gff3 --exclude-rna
  datasets download genome accession GCA_003774525.2 GCA_000001635 --chromosomes X,Y,Un.9
  datasets download genome accession PRJNA289059 --exclude-seq

Options

  -a, --annotated                only include genomes with annotation
      --api-key string           NCBI Datasets API Key
      --assembly-level string    restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold
      --assembly-source string   restrict assemblies to refseq or genbank only
      --chromosomes strings      limit to a specified, comma-delimited list of chromosomes (default [all])
      --dehydrated               download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
      --exclude-genomic-cds      exclude cds_from_genomic.fna (genomic cds file)
      --exclude-gff3             exclude genomic.gff (gff3 annotation file)
      --exclude-protein          exclude protein.faa (protein sequence file)
      --exclude-rna              exclude rna.fna (transcript sequence file)
      --exclude-seq              exclude genomic.fna (genomic sequence file)
      --filename string          specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
  -h, --help                     help for accession
      --include-gbff             include genomic.gbff (GenBank flat file sequence and annotation), if available
      --include-gtf              include genomic.gtf (gtf annotation file), if available
      --inputfile string         read a list of NCBI Assembly accessions from a file to use as input
      --no-progressbar           hide progress bar
      --reference                limit to reference and representative (GCF_ and GCA_) assemblies
      --released-before string   only include genomes that have been released before a specified date (MM/DD/YYYY)
      --released-since string    only include genomes that have been released after a specified date (MM/DD/YYYY)
      --search strings           only include genomes that have the specified text in the
                                 searchable fields: species and infraspecies, assembly name and submitter
                                 To provide multiple strings '--search' can be included multiple times
Generated March 28, 2023