Data interpretation in breath biomarker research: pitfalls and directions

J Breath Res. 2012 Sep;6(3):036007. doi: 10.1088/1752-7155/6/3/036007. Epub 2012 Aug 2.

Abstract

Most--if not all--potential diagnostic applications in breath research involve different marker concentrations rather than unique breath markers which only occur in the diseased state. Hence, data interpretation is a crucial step in breath analysis. To avoid artificial significance in breath testing every effort should be made to implement method validation, data cross-testing and statistical validation along this process. The most common data analysis related problems can be classified into three groups: confounding variables (CVs), which have a real correlation with both the diseased state and a breath marker but lead to the erroneous conclusion that disease and breath are in a causal relationship; voodoo correlations (VCs), which can be understood as statistically true correlations that arise coincidentally in the vast number of measured variables; and statistical misconceptions in the study design (SMSD). CV: Typical confounding variables are environmental and medical history, host factors such as gender, age, weight, etc and parameters that could affect the quality of breath data such as subject breathing mode, effects of breath sampling and effects of the analytical technique itself. VC: The number of measured variables quickly overwhelms the number of samples that can feasibly be taken. As a consequence, the chances of finding coincidental 'voodoo' correlations grow proportionally. VCs can typically be expected in the following scenarios: insufficient number of patients, (too) many measurement variables, the use of advanced statistical data mining methods, and non-independent data for validation. SMSD: Non-prospective, non-blinded and non-randomized trials, a priori biased study populations or group selection with unrealistically high disease prevalence typically represent misconception of study design. In this paper important data interpretation issues are discussed, common pitfalls are addressed and directions for sound data processing and interpretation are proposed.

MeSH terms

  • Bias
  • Biomarkers / analysis*
  • Breath Tests / methods*
  • Confounding Factors, Epidemiologic
  • Data Interpretation, Statistical*
  • Humans
  • Research Design

Substances

  • Biomarkers