Download and install

Install and use the NCBI Datasets command line tools

Download and install

Install and use the NCBI Datasets command line tools

The NCBI Datasets datasets command line tools are datasets and dataformat .

Use datasets to download biological sequence data across all domains of life from NCBI.

Use dataformat to convert metadata from JSON Lines format to other formats.

Datasets schema diagram



Note: The NCBI Datasets command line tools are updated frequently to add new features, fix bugs, and enhance usability. Command syntax is subject to change. Please check back often for updates.

Install NCBI Datasets command line tools

The NCBI Datasets command line tools are available on multiple platforms.

SystemArchitectureDownload
LinuxAMD64
macOSUniversal
Windows (64-bit)AMD64
LinuxARM64
LinuxARM (32-bit)

Install using conda

The NCBI Datasets command line tools are available as a conda package . It includes both datasets and dataformat.

First, create a conda environment: conda create -n ncbi_datasets

Then, activate your new environment: conda activate ncbi_datasets

Finally, install the datasets conda package: conda install -c conda-forge ncbi-datasets-cli

Install using curl

Linux

Download datasets: curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets'
Download dataformat: curl -o dataformat 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/dataformat'
Make them executable: chmod +x datasets dataformat

macOS

Download datasets: curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/mac/datasets'
Download dataformat: curl -o dataformat 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/mac/dataformat'
Make them executable: chmod +x datasets dataformat

Windows

Download datasets: curl -o datasets.exe "https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/win64/datasets.exe"
Download dataformat: curl -o dataformat.exe "https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/win64/dataformat.exe"

Use the datasets tool to download biological data

For example, the following command downloads an NCBI Datasets Gene Data Package , including sequences and metadata, for a set of NCBI GeneIDs.

Command

datasets download gene gene-id 1,2,3,9,10,11,12,13,14,15,16,17 --filename example_gene_data_package.zip
unzip -Z1 example_gene_data_package.zip

Output

Downloading: example_gene_data_package.zip    48.8kB done
README.md
ncbi_dataset/data/gene.fna
ncbi_dataset/data/rna.fna
ncbi_dataset/data/protein.faa
ncbi_dataset/data/data_report.jsonl
ncbi_dataset/data/data_table.tsv
ncbi_dataset/data/dataset_catalog.json

Use the dataformat tool to convert data reports to other formats

A data package downloaded through NCBI Datasets services contains a data report in JSON lines format. The dataformat command line tool converts the data report to formats that are more convenient for browsing: either a tab-delimited tabular format (.tsv) or an Excel spreadsheet (.xlsx).

For example, the gene data package downloaded through the previous example contains a gene data report. The following dataformat command converts the report to a tsv file.

Command

dataformat tsv gene --fields gene-id,symbol,transcript-name --package example_gene_data_package.zip | head --lines=10

Output

NCBI GeneID	Symbol	Transcript Transcript Name
2	A2M	transcript variant 2
2	A2M	transcript variant 4
2	A2M	transcript variant 1
2	A2M	transcript variant X1
2	A2M	transcript variant 3
...

Generated August 11, 2022