Random forest-based modelling to detect biomarkers for prostate cancer progression

Clin Epigenetics. 2019 Oct 22;11(1):148. doi: 10.1186/s13148-019-0736-8.

Abstract

Background: The clinical course of prostate cancer (PCa) is highly variable, demanding an individualized approach to therapy. Overtreatment of indolent PCa cases, which likely do not progress to aggressive stages, may be associated with severe side effects and considerable costs. These could be avoided by utilizing robust prognostic markers to guide treatment decisions.

Results: We present a random forest-based classification model to predict aggressive behaviour of prostate cancer. DNA methylation changes between PCa cases with good or poor prognosis (discovery cohort with n = 70) were used as input. DNA was extracted from formalin-fixed tumour tissue, and genome-wide DNA methylation differences between both groups were assessed using Illumina HumanMethylation450 arrays. For the random forest-based modelling, the discovery cohort was randomly split into a training (80%) and a test set (20%). Our methylation-based classifier demonstrated excellent performance in discriminating prognosis subgroups in the test set (Kaplan-Meier survival analyses with log-rank p value < 0.0001). The area under the receiver operating characteristic curve (AUC) for the sensitivity analysis was 95%. Using the ICGC cohort of early- and late-onset prostate cancer (n = 222) and the TCGA PRAD cohort (n = 477) for external validation, AUCs for sensitivity analyses were 77.1% and 68.7%, respectively. Cancer progression-related DNA hypomethylation was frequently located in 'partially methylated domains' (PMDs)-large-scale genomic areas with progressive loss of DNA methylation linked to mitotic cell division. We selected several candidate genes with differential methylation in gene promoter regions for additional validation at the protein expression level by immunohistochemistry in > 12,000 tissue micro-arrayed PCa cases. Loss of ZIC2 protein expression was associated with poor prognosis and correlated with significantly shorter time to biochemical recurrence. The prognostic value of ZIC2 proved to be independent from established clinicopathological variables including Gleason grade, tumour stage, nodal stage and prostate-specific-antigen.

Conclusions: Our results highlight the prognostic relevance of methylation loss in PMD regions, as well as of several candidate genes not previously associated with PCa progression. Our robust and externally validated PCa classification model either directly or via protein expression analyses of the identified top-ranked candidate genes will support the clinical management of prostate cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism
  • DNA Methylation*
  • Disease Progression
  • Down-Regulation
  • Epigenomics / methods*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Male
  • Middle Aged
  • Models, Theoretical
  • Neoplasm Grading
  • Nuclear Proteins / genetics*
  • Nuclear Proteins / metabolism*
  • Precision Medicine
  • Prognosis
  • Promoter Regions, Genetic
  • Prostatic Neoplasms / genetics*
  • Prostatic Neoplasms / metabolism
  • Prostatic Neoplasms / pathology
  • Survival Analysis
  • Tissue Array Analysis
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism*

Substances

  • Biomarkers, Tumor
  • Nuclear Proteins
  • Transcription Factors
  • ZIC2 protein, human