miRAW: A deep learning-based approach to predict microRNA targets by analyzing whole microRNA transcripts

PLoS Comput Biol. 2018 Jul 13;14(7):e1006185. doi: 10.1371/journal.pcbi.1006185. eCollection 2018 Jul.

Abstract

MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression by binding to partially complementary regions within the 3'UTR of their target genes. Computational methods play an important role in target prediction and assume that the miRNA "seed region" (nt 2 to 8) is required for functional targeting, but typically only identify ∼80% of known bindings. Recent studies have highlighted a role for the entire miRNA, suggesting that a more flexible methodology is needed. We present a novel approach for miRNA target prediction based on Deep Learning (DL) which, rather than incorporating any knowledge (such as seed regions), investigates the entire miRNA and 3'TR mRNA nucleotides to learn a uninhibited set of feature descriptors related to the targeting process. We collected more than 150,000 experimentally validated homo sapiens miRNA:gene targets and cross referenced them with different CLIP-Seq, CLASH and iPAR-CLIP datasets to obtain ∼20,000 validated miRNA:gene exact target sites. Using this data, we implemented and trained a deep neural network-composed of autoencoders and a feed-forward network-able to automatically learn features describing miRNA-mRNA interactions and assess functionality. Predictions were then refined using information such as site location or site accessibility energy. In a comparison using independent datasets, our DL approach consistently outperformed existing prediction methods, recognizing the seed region as a common feature in the targeting process, but also identifying the role of pairings outside this region. Thermodynamic analysis also suggests that site accessibility plays a role in targeting but that it cannot be used as a sole indicator for functionality. Data and source code available at: https://bitbucket.org/account/user/bipous/projects/MIRAW.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions / genetics
  • Binding Sites
  • Computer Simulation*
  • Datasets as Topic
  • Deep Learning*
  • Gene Expression Regulation / genetics
  • Gene Targeting*
  • Humans
  • MicroRNAs / genetics*
  • MicroRNAs / metabolism
  • Neural Networks, Computer
  • RNA, Messenger / genetics*
  • Reproducibility of Results
  • Thermodynamics

Substances

  • 3' Untranslated Regions
  • MicroRNAs
  • RNA, Messenger

Grants and funding

Albert Pla received funding from the European Union Seventh Framework Program (FP7-PEOPLE-2013-COFUND) under grant agreement 609020 - Scientia Fellows. Albert Pla also was awarded with an NVIDIA GPU provided by the NVIDIA GPU grant program. Xiangfu Zhong received funding from the Helse Sør Øst project 2016122. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.