On the Search for Retrotransposons: Alternative Protocols to Obtain Sequences to Learn Profile Hidden Markov Models

J Comput Biol. 2018 May;25(5):517-527. doi: 10.1089/cmb.2017.0219. Epub 2018 Jan 3.

Abstract

Profile hidden Markov models (pHMMs) have been used to search for transposable elements (TEs) in genomes. For the learning of pHMMs aimed to search for TEs of the retrotransposon class, the conventional protocol is to use the whole internal nucleotide portions of these elements as representative sequences. To further explore the potential of pHMMs in such a search, we propose five alternative ways to obtain the sets of representative sequences of TEs other than the conventional protocol. In this study, we are interested in Bel-PAO, Copia, Gypsy, and DIRS superfamilies from the retrotransposon class. We compared the pHMMs of all six protocols. The test results show that, for each TE superfamily, the pHMMs of at least two of the proposed protocols performed better than the conventional one and that the number of correct predictions provided by the latter can be improved by considering together the results of one or more of the alternative protocols.

Keywords: profile hidden Markov models; retrotransposons; transposable elements.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Drosophila melanogaster / genetics*
  • Evolution, Molecular
  • Genome*
  • Markov Chains*
  • Retroelements*

Substances

  • Retroelements