Knotty: efficient and accurate prediction of complex RNA pseudoknot structures

Bioinformatics. 2018 Nov 15;34(22):3849-3856. doi: 10.1093/bioinformatics/bty420.

Abstract

Motivation: The computational prediction of RNA secondary structure by free energy minimization has become an important tool in RNA research. However in practice, energy minimization is mostly limited to pseudoknot-free structures or rather simple pseudoknots, not covering many biologically important structures such as kissing hairpins. Algorithms capable of predicting sufficiently complex pseudoknots (for sequences of length n) used to have extreme complexities, e.g. Pknots has O(n6) time and O(n4) space complexity. The algorithm CCJ dramatically improves the asymptotic run time for predicting complex pseudoknots (handling almost all relevant pseudoknots, while being slightly less general than Pknots), but this came at the cost of large constant factors in space and time, which strongly limited its practical application (∼200 bases already require 256 GB space).

Results: We present a CCJ-type algorithm, Knotty, that handles the same comprehensive pseudoknot class of structures as CCJ with improved space complexity of Θ(n3+Z)-due to the applied technique of sparsification, the number of 'candidates', Z, appears to grow significantly slower than n4 on our benchmark set (which include pseudoknotted RNAs up to 400 nt). In terms of run time over this benchmark, Knotty clearly outperforms Pknots and the original CCJ implementation, CCJ 1.0; Knotty's space consumption fundamentally improves over CCJ 1.0, being on a par with the space-economic Pknots. By comparing to CCJ 2.0, our unsparsified Knotty variant, we demonstrate the isolated effect of sparsification. Moreover, Knotty employs the state-of-the-art energy model of 'HotKnots DP09', which results in superior prediction accuracy over Pknots.

Availability and implementation: Our software is available at https://github.com/HosnaJabbari/Knotty.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms
  • Nucleic Acid Conformation
  • RNA / chemistry*
  • Sequence Analysis, RNA
  • Software*

Substances

  • RNA