Density distribution of gene expression profiles and evaluation of using maximal information coefficient to identify differentially expressed genes

PLoS One. 2019 Jul 17;14(7):e0219551. doi: 10.1371/journal.pone.0219551. eCollection 2019.

Abstract

The hypothesis of data probability density distributions has many effects on the design of a new statistical method. Based on the analysis of a group of real gene expression profiles, this study reveal that the primary density distributions of the real profiles are normal/log-normal and t distributions, accounting for 80% and 19% respectively. According to these distributions, we generated a series of simulation data to make a more comprehensive assessment for a novel statistical method, maximal information coefficient (MIC). The results show that MIC is not only in the top tier in the overall performance of identifying differentially expressed genes, but also exhibits a better adaptability and an excellent noise immunity in comparison with the existing methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Area Under Curve
  • Bacteria
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Profiling*
  • Humans
  • Linear Models
  • Models, Statistical
  • Plants
  • Probability
  • Reproducibility of Results

Grants and funding

The research is supported by the National Natural Science Foundation of China (under Grant No. 31660321).