Download a gene ortholog data package

Download a gene ortholog dataset for a gene using the datasets command-line tool.

Download a gene ortholog data package

Download a gene ortholog dataset for a gene using the datasets command-line tool.

Gene ortholog metadata and FASTA sequence are available as an NCBI Datasets Gene Data Package .

Gene Orthology at NCBI

At NCBI, gene orthologs are calculated for vertebrate and insect genes. Gene orthologs for most vertebrates are calculated in comparison to human . For fish, we have separately calculated orthologs based on comparison to zebrafish . For insects, orthologs are calculated in comparison to <em>Drosophila melanogaster</em>.

Types of orthology data available through NCBI datasets

Orthologs are most easily retrieved through the NCBI Datasets command line tool .

  1. Ortholog summaries describe these ortholog datasets in JSON format and can be retrieved using the summary ortholog command.

  2. Ortholog datasets are downloadable zip files including sequence data, a data table and a data report for all calculated orthologs of the query gene. Sequence data includes gene, transcript and protein sequences. Ortholog datasets are retrieved using the download ortholog command.

Getting ortholog summaries

The datasets summary ortholog command prints a summary of an ortholog dataset, including metadata for all calculated gene orthologs of a query gene. The ortholog summary can be requested by NCBI Gene ID, gene symbol or RefSeq nucleotide or protein accession. The summary is returned in JSON format.

When requesting an ortholog summary by gene symbol, you can also specify a species name or species-level Taxonomy ID using the --taxon flag. If no species is provided, ortholog summaries for human genes will be returned.

For example, here are some datasets examples of each of these:

datasets summary ortholog gene-id 59272
datasets summary ortholog symbol gapdh --taxon mouse
datasets summary ortholog symbol gapdh --taxon mouse --taxon-filter mammals

See the additional documentation for converting JSON (or JSON lines, using the --as-json-lines flag) to a tabular format.

Downloading gene ortholog data packages

The datasets download ortholog command downloads an ortholog dataset including sequence data, a data table and a data report for for all calculated orthologs of the query gene. Sequence data includes gene, transcript and protein sequences. Datasets are downloaded as a zip file.

Ortholog datasets can be requested by NCBI Gene ID, gene symbol or RefSeq transcript or protein accession.

As with summary requests, you may specify ortholog dataset requested by gene symbol with a species name or species-level Taxonomy ID using the --taxon flag. If no species is provided, data for human genes will be returned as a gene data package .

For example, here are some datasets examples to download a gene ortholog data package:


datasets download ortholog gene-id 59272
datasets download ortholog symbol gapdh --taxon mouse
datasets download ortholog accession NM_000492.4 --filename cftr-ortho.zip

To convert the contained data report to tabular format, read about using the dataformat tool .

Generated October 22, 2021