pblat: a multithread blat algorithm speeding up aligning sequences to genomes

BMC Bioinformatics. 2019 Jan 15;20(1):28. doi: 10.1186/s12859-019-2597-8.

Abstract

Background: The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole genome or whole transcriptome sequences to reference genomes this program can take days to finish, making it unsuitable for large scale sequencing projects and iterative analysis. Here, we present pblat (parallel blat), a parallelized blat algorithm with multithread and cluster computing support, which functions to rapidly fine map large scale DNA/RNA sequences against genomes.

Results: The pblat algorithm takes advantage of modern multicore processors and significantly reduces the run time with the number of threads used. pblat utilizes almost equal amount of memory as when running blat. The results generated by pblat are identical with those generated by blat. The pblat tool is easy to install and can run on Linux and Mac OS systems. In addition, we provide a cluster version of pblat (pblat-cluster) running on computing clusters with MPI support.

Conclusion: pblat is open source and free available for non-commercial users. It is easy to install and easy to use. pblat and pblat-cluster would facilitate the high-throughput mapping of large scale genomic and transcript sequences to reference genomes with both high speed and high precision.

Keywords: Cluster computing; Genome annotation; Parallel computing; Sequence alignment.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Software*