SRA in the Cloud

Overview

Sequence Read Archive (SRA) data is available on the Google Cloud Platform (GCP) and Amazon Web Services (AWS) clouds. All publicly-available, unassembled read data and authorized-access human data are available for access and compute through these cloud providers.

Are you cloud-curious?

There are several benefits to working with SRA data in the cloud:

  • Access to original submitted data files
  • Faster download speed
  • Unlimited concurrent downloads from our cloud buckets to your buckets

Accessing the SRA data in the cloud requires an instance to be setup.

You can perform cloud native search for data using Athena from AWS or BigQuery from Google. With Athena and BigQuery you can:

  • Write your own SQL to search for your specific data sets
  • Get search results in seconds, at very low cost
  • Calculate statistics on the available data from SRA
  • Access the data using multiple API libraries

Search for data

BigQuery (in the Google Cloud)

BigQuery provides fast, programmatic access to SRA metadata and supports a large collection of client libraries.

Athena (in AWS)

AWS provides fast, programmatic access to SRA metadata and supports a large collection of client libraries.

NCBI's Entrez search engine

Download/Access the Data

SRA Toolkit allows you to create next-generation sequencing files in your desired format and cloud bucket. You can also download originally-submitted files for some data sets.

To download dbGaP data from the cloud, you need to use both the most recent version of the SRA toolkit and a JWT file instead of the NGC file.

The Cloud Data Delivery service allows the delivery of files that are not accessible by the SRA Toolkit directly to your AWS and GCP bucket.

SRA on YouTube: Tutorials

Engage

NCBI wants your feedback on SRA in the Cloud. Contact sra@ncbi.nlm.nih.gov with questions or if you would like to provide input on new functionality.

Support Center

Last updated: 2022-10-05T18:18:50Z