dbGAP Download Guide

Introduction

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies investigating the interaction of genotype and phenotype in humans.

The following guide will outline the configuration of the SRA Toolkit for use with protected data from dbGaP.
Detailed information regarding the usage of individual tools in the SRA Toolkit can be found on the tool-specific documentation pages.

Prerequisites

  • User must have SRA Toolkit latest release offsite image installed.
  • Users that wish to access controlled-access data must first apply for approval. Please review the process at the Authorized Access Portal offsite image.
  • Once granted access to a project, the PI may login and click the get dbGaP repository key link next to the project to download the repository key. This file should be closely guarded.
Get dbGAP Repository Key

For users that do not yet have an approved project, the test key prj_phs710EA_test.ngc is available for accessing a copy of 1000 Genomes data from NCBI. Downloading this key will allow users to test their toolkit configuration on encrypted data that is consented for public access.

Visual SRA Toolkit configuration

1. Open the Toolkit configuration panel

Go to the bin subdirectory for the SRA Toolkit and run the following command line:
./vdb-config -i

dbGAP SRA Toolkit Figure 1

2. Open the File Navigation Dialog

  • Review any settings presented to ensure they are correct.
dbGAP SRA Toolkit Figure 2
  • Tab to Import Repository Key button and press enter or space.
dbGAP SRA Toolkit Figure 3

3. Navigate to the location of the downloaded .ngc file

dbGAP SRA Toolkit Figure 4

4. Select .ngc file

Press Tab to get to the file list and select the .ngc file.

dbGAP SRA Toolkit Figure 5

5. Confirm selection

Press tab to get to OK button and press Enter or Space.

dbGAP SRA Toolkit Figure 5

6. Confirm import

You will be prompted to confirm the import.

dbGAP SRA Toolkit Figure 6

Then, the import will be confirmed.

dbGAP SRA Toolkit Figure 7

7. Change download location

You will be asked to change the location where the projects files will be stored. Genomics datasets are quite large; you may need 100's of GB of free space. This is the primary concern when choosing the location. Do you have enough free space there for what you intend to do? A workspace is the directory which contains all the data and analysis for a project. Each dbGaP project must have its own workspace that is separate from other protected project workspaces.

dbGAP Storage example

 
 

dbGAP SRA Toolkit Figure 8

If you choose Yes, this will bring up the file navigation dialog (see below). If you already know the path to the directory, you may use the Goto button to directly enter that path. Once you have entered or navigated to the correct directory, press tab to get to the OK button to return to the previous screen.

dbGAP SRA Toolkit Figure 9

8. Exit the Toolkit configuration panel

Once you are back to the main screen, use tab to get to the Exit button and press enter or space.

dbGAP SRA Toolkit Figure 10

Command line SRA Toolkit configuration

Use vdb-config command line tool offsite image to import the repository key:
vdb-config --import prj_phs710EA_test.ngc

Accessing Encrypted dbGaP Data

Exclamation point In order to access the projects data, you will need to change directory or "cd" to the project's workspace. Once there, all the project's data is available to you. You do not need to (or even want to) decrypt it manually.

The SRA Toolkit vdb-decrypt offsite image program supports the decryption process for dbGaP phenotype and genotype files, however SRA files do not need to be decrypted, and the utility will ignore attempts to do so.
vdb-decrypt <encrypted_file>

Exclamation point The SRA Toolkit will only decrypt and download project files when excecuted from within the project's workspace directory. Below are some examples using the test key prj_phs710EA_test.ngc running from the default import location for the repository key and the SRA toolkit available in the user's path.

Information about the fastq-dump offsite image and options used in the example can be found on the tool's documentation page.
~/ncbi/dbGaP-0 $ fastq-dump -Z -X 5 SRR1219902

The sam-dump offsite image and options used in the example can be found on the tool's documentation page.
~/ncbi/dbGaP-0 $ sam-dump --aligned-region 15:28196787-28197287 SRR1219902

Exclamation point If you encounter issues decrypting your data please visit the troubleshooting page before contacting the SRA.

Accessing dbGaP Data on Amazon Cloud


Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2020-03-24T17:57:48Z