Search in Athena

Overview

SRA has deposited its metadata into Athena to provide the bioinformatics community with programmatic access to this data. You can now search across the entire SRA by sequencing methodologies and sample attributes. NCBI provides help users leverage the benefits of elastic scaling and parallel execution of queries. Athena has a large collection of client libraries that can be used within your workflow. You can also interact with it on a web browser.
 

The Athena resource contains tables for SRA metadata and computed metadata on SRA runs. It also contains metadata on SRA aligned reads, including taxonomic content and BLAST results.

Tables

The list of SRA-cloud-based tables can be found here: SRA cloud-based tables.
 

Detailed descriptions of the tables contained in Athena can be found here: SRA Aligned Read Format Table Definitions.
 

Please read about the SRA Taxonomy Analysis Tool to learn how the analysis is carried out.

The Basics of SQL

The basic SQL query has three parts or statements:

  • SELECT: Identifies which columns from the selected table(s) to show. The * indicates "all columns"
  • FROM: Identifies table(s) to query
  • WHERE: Joins tables using the identical columns in both tables and sets filters on the query

In Athena, the table name (eg:. metadata) is defined by NCBI but the database name (<db_name> in all examples) is defined by the user. This name is chosen at the time you create the Glue crawler or manually create the database. For all queries the <db_name> tag should be replaced with the name chosen for your local database.

Basic example query

Select all columns (indicated by '*') from the table called <db_name>.metadata that have the organism value "Homo sapiens"; the results are limited to the first 10 hits by limit 10 (this can be removed from any example to get the full result set instead of just 10).

SELECT *
FROM <db_name>.metadata
WHERE organism = 'Homo sapiens'
limit 10

Example queries for web UI

Search for records of the pipefish:

SELECT *
FROM "<db_name>"."metadata"
WHERE 'Syngnathus scovelli'
limit 10

Find all the public human data sets:

SELECT *
FROM "<db_name>"."metadata"
WHERE organism = 'Homo sapiens' AND consent='public'
limit 10

Build a local taxonomic tree by ordering the data based on ileft and ilevel for a metagenomic data set:

SELECT *
FROM "<db_name>"."tax_analysis"
WHERE acc = 'SRR2046458' ORDER BY ileft, ilevel

Search for SRA Runs by taxonomic name:

SELECT *
FROM "<db_name>"."tax_analysis"
WHERE name = 'Sarbecovirus' AND total_count > 1
limit 10

Find all SRA aligned read contigs that have taxonomy ID 2697049 (SARS-CoV-2), have coverage of at least 100x and a total length greater than 15,000 bases:

SELECT *
FROM "<db_name>"."contigs"
WHERE where tax_id = '2697049' AND coverage > 100 and length > 15000
limit 10

Find all SRA aligned read contigs' BLASTn hits where the taxonomy ID of the hit is 2697049 (SARS-CoV-2), the percent identity is greater than 99% and the hit length is at least 25,000 bases:

SELECT *
FROM "<db_name>"."blastn"
WHERE tax_id = '2697049' AND pident > 99 and length > 25000
limit 10

Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2020-10-05T19:40:49Z