dbGAP Download Guide

Introduction

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies investigating the interaction of genotype and phenotype in humans.

The following guide will outline the configuration of the SRA Toolkit for use with protected data from dbGaP.
Detailed information regarding the usage of individual tools in the SRA Toolkit can be found on the tool-specific documentation pages.

Starting with SRA Toolkit version 2.10.2, there are several important changes:

  • You no longer need to import the NGC file to the configuration
  • The NGC file will need to be specified as part of the command line every time you run a tool
  • For SRA Runs, you no longer have an option to create a cart, but will need to use a list of Run accessions

Prerequisites

  • User must have SRA Toolkit latest release offsite image installed.
  • You will need to run vdb-config -i a single time to generate the basic configuration setup. No options need to be set to use the toolkit version 2.10.4.
  • Users that wish to access controlled-access data must first apply for approval. Please review the process at the Authorized Access Portal offsite image.
  • Once granted access to a project, the PI may login and click the get dbGaP repository key link next to the project to download the repository key. This file should be closely guarded.
Get dbGAP Repository Key

For users that do not yet have an approved project, the test key prj_phs710EA_test.ngc is available for accessing a copy of 1000 Genomes data from NCBI. Downloading this key will allow users to test their toolkit configuration on encrypted data that is consented for public access.

Downloading with NGC for use on any server

Tack NCBI introduced the JWT cart to download data from our cloud storage to your cloud storage, but if you are downloading dbGaP data to your local server/computer, you will need to continue using the NGC file.

Downloading the data

To download the data, run the following command:

./prefetch --ngc your_file.ngc SRR1234567

This will create a file called something like SRR1234567_dbgap_#####.sra. To decrypt the data, run the same command as before, but change the name of the Run file by removing the 'dbgap#####':

./mv SRR1234567_dbgap_#####.sra SRR1234567.sra

And provide the NGC on the command line again:

fasterq-dump --ngc your_file.ngc SRR1234567.sra

Downloading phenotype files with ngc

Similar to downloading protected SRA Runs, downloading phenotype files has also changed since Toolkit version 2.10.2.

In the dbGaP File Selector(available from your project's authorized access page) select at least one file to activate the button Cart file.

To download the data, run the following command:

prefetch --ngc your_file.ngc cart_prj#####_###.krt

To decrypt the data, run the same command as before, but provide the NGC on the command line again:

vdb-decrypt --ngc your_file.ngc enc_file.xml

The SRA Toolkit version 2.9.6 visual configuration

For users who cannot upgrade to newer 2.10 version:

The SRA Toolkit version 2.9.6 visual configuration

Accessing dbGaP Data on the Cloud

dbGAP Cloud Access


Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2020-05-18T21:08:22Z