Log-transformation and its implications for data analysis

Shanghai Arch Psychiatry. 2014 Apr;26(2):105-9. doi: 10.3969/j.issn.1002-0829.2014.02.009.

Abstract

The log-transformation is widely used in biomedical and psychosocial research to deal with skewed data. This paper highlights serious problems in this classic approach for dealing with skewed data. Despite the common belief that the log transformation can decrease the variability of data and make data conform more closely to the normal distribution, this is usually not the case. Moreover, the results of standard statistical tests performed on log-transformed data are often not relevant for the original, non-transformed data.We demonstrate these problems by presenting examples that use simulated data. We conclude that if used at all, data transformations must be applied very cautiously. We recommend that in most circumstances researchers abandon these traditional methods of dealing with skewed data and, instead, use newer analytic methods that are not dependent on the distribution the data, such as generalized estimating equations (GEE).

对数转换的方法在生物医学和社会心理研究中处理偏斜非正态数据中时被广泛应用。本文重点突出介绍该经典传统方法在处理偏斜非正态数据中时存在的严重问题。尽管通常认为对数转换在可以减少数据的变异性并且,使数据更符合正态分布是达成共识的,但是通常并非如此。此外,对数转换后的数据得出的标准统计测试结果往往和未转化的原始数据不相关。我们通过使用模拟数据示例来演示说明这些问题。结果表明,我们认为如果采用数据转换,必须非常谨慎应用。我们建议研究者在大多数情况下摒弃这些传统的处理偏斜非正态数据的传统的方法,而选择采用较新的不依赖于数据分布的方法:如广义估计方程(GEE)

Keywords: hypothesis testing; lon-normal distribution; normal distribution; outliners; skewness.

Grants and funding

This research was supported in part by the Novel Biostatistical and Epidemiologic Methodology grants from the University of Rochester Medical Center Clinical and Translational Science Institute Pilot Awards Program.