Search in BigQuery

Overview

SRA has deposited its metadata into BigQuery to provide the bioinformatics community with programmatic access to this data. You can now search across the entire SRA by sequencing methodologies and sample attributes. NCBI is piloting this in BigQuery to help users leverage the benefits of elastic scaling and parallel execution of queries. BigQuery has a large collection of client libraries that can be used within your workflow. You can also interact with it on a web browser.

The Big Query resource contains a tables for SRA metadata and computed metadata on SRA runs.

Tables

The list of tables can be found here: SRA cloud-based tables.
 

Please read about the SRA Taxonomy Analysis Tool to learn how the analysis is carried out.

The Basics of SQL

The basic SQL query has three parts or statements:

  • SELECT: Identifies which columns from the selected table(s) to show. The * indicates "all columns"
  • FROM: Identifies table(s) to query
  • WHERE: Joins tables using the identical columns in both tables and sets filters on the query

New RUN

This area, outlined in red, shows how much data will be searched with your SQL query so that you can estimate the cost of running that query.

Basic example query

Select all columns from the table called "nih-sra-datastore.sra.metadata" where records have the organism "Homo sapiens":

SELECT *
FROM `nih-sra-datastore.sra.metadata`
WHERE organism = 'Homo sapiens'

Example queries for web UI

Search for records of an adult female pipefish:

SELECT *
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Syngnathus scovelli' and ( ('sex_calc', 'female') in UNNEST(s.attributes) and ('dev_stage_sam', 'Adult') in UNNEST(s.attributes) ) limit 10

Find all the public human data sets using this query:

SELECT *
FROM `nih-sra-datastore.sra.metadata`
WHERE organism = 'Homo sapiens' AND consent='public' limit 10

Command line examples

Get metadata for ten thousand samples from public SRA records:

bq --format=csv query --nouse_legacy_sql --max_rows=10000 'SELECT acc,center_name,data FROM `nih-sra-datastore.sra.metadata` cross join UNNEST(attributes) as data where consent = "public" and acc like "SRR%"'

Obtain accession_list.txt file with the list of accessions that you can use with the SRA Toolkit to download the data:

bq --format=csv query --nouse_legacy_sql --max_rows=10000 ‘SELECT acc FROM `nih-sra-datastore.sra.metadata` WHERE consent = “public” and acc like "SRR%"’ | sed '2 d' > accession_list.txt

Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2020-09-17T13:59:09Z