Combination of measures distinguishes pre-miRNAs from other stem-loops in the genome of the newly sequenced Anopheles darlingi

BMC Genomics. 2010 Sep 29:11:529. doi: 10.1186/1471-2164-11-529.

Abstract

Background: Efforts using computational algorithms towards the enumeration of the full set of miRNAs of an organism have been limited by strong reliance on arguments of precursor conservation and feature similarity. However, miRNA precursors may arise anew or be lost across the evolutionary history of a species and a newly sequenced genome may be evolutionarily too distant from other genomes for an adequate comparative analysis. In addition, the learning of intricate classification rules based purely on features shared by miRNA precursors that are currently known may reflect a perpetuating identification bias rather than a sound means to tell true miRNAs from other genomic stem-loops.

Results: We show that there is a strong bias amongst annotated pre-miRNAs towards robust stem-loops in the genomes of Drosophila melanogaster and Anopheles gambiae and we propose a scoring scheme for precursor candidates which combines four robustness measures. Additionally, we identify several known pre-miRNA homologs in the newly sequenced Anopheles darlingi and show that most are found amongst the top-scoring precursor candidates. Furthermore, a comparison of the performance of our approach is made against two single-genome pre-miRNA classification methods.

Conclusions: In this paper we present a strategy to sieve through the vast amount of stem-loops found in metazoan genomes in search of pre-miRNAs, significantly reducing the set of candidates while retaining most known miRNA precursors. This approach makes no use of conservation data and relies solely on properties derived from our knowledge of miRNA biogenesis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Anopheles / genetics*
  • Databases, Nucleic Acid
  • Genome, Insect / genetics*
  • Genomics / methods*
  • MicroRNAs / chemistry*
  • MicroRNAs / genetics*
  • Nucleic Acid Conformation*
  • ROC Curve
  • Sequence Analysis, DNA / methods*

Substances

  • MicroRNAs