Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA

Brief Bioinform. 2015 Mar;16(2):291-303. doi: 10.1093/bib/bbu003. Epub 2014 Mar 13.

Abstract

With accumulating research on the interconnections among different types of genomic regulations, researchers have found that multidimensional genomic studies outperform one-dimensional studies in multiple aspects. Among many sources of multidimensional genomic data, The Cancer Genome Atlas (TCGA) provides the public with comprehensive profiling data on >30 cancer types, making it an ideal test bed for conducting and comparing different analyses. In this article, the analysis goal is to apply several existing methods and associate multidimensional genomic measurements with cancer outcomes in particular prognosis, with special focus on the predictive power of genomic signatures. We exploit clinical data and four types of genomic measurement including mRNA gene expression, DNA methylation, microRNA and copy number alterations for breast invasive carcinoma, glioblastoma multiforme, acute myeloid leukemia and lung squamous cell carcinoma collected by TCGA. To accommodate the high dimensionality, we extract important features using Principal Component Analysis, Partial Least Squares and Least Absolute Shrinkage and Selection Operator (Lasso), which are representative of dimension reduction and variable selection techniques and have been extensively adopted, and fit Cox survival models with combined important features. We calibrate the predictive power of each type of genomic measurement for the prognosis of four cancer types and find that the results vary across cancers. Our analysis also suggests that for most of the cancers in our study and the adopted methods, there is no substantial improvement in prediction when adding other genomic measurement after gene expression and clinical covariates have been included in the model. This is consistent with the findings that molecular features measured at the transcription level affect clinical outcomes more directly than those measured at the DNA/epigenetic level.

Keywords: The Cancer Genome Atlas (TCGA); cancer prognosis; multidimensional genomic study; prediction.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Brain Neoplasms / genetics
  • Breast Neoplasms / genetics
  • Carcinoma, Squamous Cell / genetics
  • Computational Biology
  • Databases, Genetic / statistics & numerical data
  • Female
  • Genomics / statistics & numerical data*
  • Glioblastoma / genetics
  • Humans
  • Least-Squares Analysis
  • Leukemia, Myeloid, Acute / genetics
  • Lung Neoplasms / genetics
  • Male
  • Neoplasms / genetics*
  • Neoplasms / mortality
  • Principal Component Analysis
  • Prognosis
  • Proportional Hazards Models