A Galaxy-based bioinformatics pipeline for optimised, streamlined microsatellite development from Illumina next-generation sequencing data

Conserv Genet Resour. 2016;8(4):481-486. doi: 10.1007/s12686-016-0570-7. Epub 2016 Aug 2.

Abstract

Microsatellites are useful tools for ecologists and conservationist biologists, but are taxa-specific and traditionally expensive and time-consuming to develop. New methods using next-generation sequencing (NGS) have reduced these problems, but the plethora of software available for processing NGS data may cause confusion and difficulty for researchers new to the field of bioinformatics. We developed a bioinformatics pipeline for microsatellite development from Illumina paired-end sequences, which is packaged in the open-source bioinformatics tool Galaxy. This optimises and streamlines the design of a microsatellite panel and provides a user-friendly graphical user interface. The pipeline utilises existing programs along with our own novel program and wrappers to: quality-filter and trim reads (Trimmomatic); generate sequence quality reports (FastQC); identify potentially-amplifiable microsatellite loci (Pal_finder); design primers (Primer3); assemble pairs of reads to enhance marker amplification success rates (PANDAseq); and filter optimal loci (Pal_filter). The complete pipeline is freely available for use via a pre-configured Galaxy instance, accessible at https://palfinder.ls.manchester.ac.uk.

Keywords: Galaxy; Illumina; Microsatellite isolation; Next-generation sequencing; PANDAseq; Pal_filter; Pal_finder; SSRs; Seq-SSR; Trimmomatic.