U.S. flag

An official website of the United States government

SARS-CoV-2 Variant Calling Pipeline

Overview

To support investigation of viral evolution during the pandemic and after the introduction of vaccines, variants are identified relative to the SARS-CoV-2 RefSeq record for each processed run. The processing pipelines described below have been validated via collaboration with stakeholders in the
NIH ACTIV TRACEoffsite image initiative to maximize consistency across similar pipelines and across sequencing technologies. Widely used, public domain tools are employed in the process to ensure repeatability.

To support ease of access via Amazon Athena, the VCF is first converted to SPDI format and then to Parquet format. The Parquet format supports direct queries in Athena, and users can identify runs containing specific SARS-CoV-2 variants. The VCFs themselves, in both VCF and SPDI format, are also available via the AWS ODP. The pipeline to generate all these artifacts processes new data every 6 hours, and updates to the
AWS ODP bucketoffsite image are made once per day.

Below, find detailed descriptions of the variant calling procedure used for each sequencing technology:

Illumina Variant Calling Pipeline
Oxford Nanopore Variant Calling Pipeline
PacBio Variant Calling Pipeline

Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2022-07-21T12:39:49Z