A naive Bayes algorithm for tissue origin diagnosis (TOD-Bayes) of synchronous multifocal tumors in the hepatobiliary and pancreatic system

Int J Cancer. 2018 Jan 15;142(2):357-368. doi: 10.1002/ijc.31054. Epub 2017 Oct 16.

Abstract

Synchronous multifocal tumors are common in the hepatobiliary and pancreatic system but because of similarities in their histological features, oncologists have difficulty in identifying their precise tissue clonal origin through routine histopathological methods. To address this problem and assist in more precise diagnosis, we developed a computational approach for tissue origin diagnosis based on naive Bayes algorithm (TOD-Bayes) using ubiquitous RNA-Seq data. Massive tissue-specific RNA-Seq data sets were first obtained from The Cancer Genome Atlas (TCGA) and ∼1,000 feature genes were used to train and validate the TOD-Bayes algorithm. The accuracy of the model was >95% based on tenfold cross validation by the data from TCGA. A total of 18 clinical cancer samples (including six negative controls) with definitive tissue origin were subsequently used for external validation and 17 of the 18 samples were classified correctly in our study (94.4%). Furthermore, we included as cases studies seven tumor samples, taken from two individuals who suffered from synchronous multifocal tumors across tissues, where the efforts to make a definitive primary cancer diagnosis by traditional diagnostic methods had failed. Using our TOD-Bayes analysis, the two clinical test cases were successfully diagnosed as pancreatic cancer (PC) and cholangiocarcinoma (CC), respectively, in agreement with their clinical outcomes. Based on our findings, we believe that the TOD-Bayes algorithm is a powerful novel methodology to accurately identify the tissue origin of synchronous multifocal tumors of unknown primary cancers using RNA-Seq data and an important step toward more precision-based medicine in cancer diagnosis and treatment.

Keywords: RNA-Seq; hepatobiliary and pancreatic system; naive Bayes algorithm; synchronous multifocal tumors; tissue origin.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bayes Theorem*
  • Biliary Tract Neoplasms / diagnosis*
  • Biliary Tract Neoplasms / genetics
  • Biomarkers, Tumor / genetics*
  • Cell Lineage / genetics*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Liver Neoplasms / diagnosis*
  • Liver Neoplasms / genetics
  • Neoplasms, Multiple Primary / diagnosis*
  • Neoplasms, Multiple Primary / genetics
  • Pancreatic Neoplasms / diagnosis*
  • Pancreatic Neoplasms / genetics
  • Prognosis

Substances

  • Biomarkers, Tumor