Robust in-silico identification of cancer cell lines based on next generation sequencing

Oncotarget. 2017 May 23;8(21):34310-34320. doi: 10.18632/oncotarget.16110.

Abstract

Cancer cell lines (CCL) are important tools for cancer researchers world-wide. However, handling of cancer cell lines is error-prone, and critical errors such as misidentification and cross-contamination occur more often than acceptable. Based on the fact that CCL today very often are sequenced (partly or entirely) anyway as part of the studies performed, we developed Uniquorn, a computational method that reliably identifies CCL samples based on variant profiles derived from whole exome or whole genome sequencing. Notably, Uniquorn does neither require a particular sequencing technology nor downstream analysis pipeline but works robustly across different NGS platforms and analysis steps. We evaluated Uniquorn by comparing more than 1900 CCL profiles from three large CCL libraries, embracing 1585 duplicates, against each other. In this setting, our method achieves a sensitivity of 97% and specificity of 99%. Errors are strongly associated to low quality mutation profiles. The R-package Uniquorn is freely available as Bioconductor-package.

Keywords: DNA-sequencing; cancer cell lines; cell line-identification; data-heterogeneity and incompleteness; next-generation sequencing.

MeSH terms

  • Cell Line, Tumor*
  • Computational Biology / methods*
  • Computer Simulation
  • Genetic Variation*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Neoplasms / genetics
  • Sequence Analysis, DNA / methods
  • Software