Distributions of p-values smaller than .05 in psychology: what is going on?

Chris H J Hartgerink; Robbie C M van Aert; Michèle B Nuijten; Jelte M Wicherts; Marcel A L M van Assen

doi:10.7717/peerj.1935

Distributions of p-values smaller than .05 in psychology: what is going on?

PeerJ. 2016 Apr 11:4:e1935. doi: 10.7717/peerj.1935. eCollection 2016.

Authors

Chris H J Hartgerink¹, Robbie C M van Aert¹, Michèle B Nuijten¹, Jelte M Wicherts¹, Marcel A L M van Assen²

Affiliations

¹ Department of Methodology and Statistics, Tilburg University , Tilburg , The Netherlands.
² Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands; Department of Sociology, Utrecht University, Utrecht, The Netherlands.

Abstract

Previous studies provided mixed findings on pecularities in p-value distributions in psychology. This paper examined 258,050 test results across 30,710 articles from eight high impact journals to investigate the existence of a peculiar prevalence of p-values just below .05 (i.e., a bump) in the psychological literature, and a potential increase thereof over time. We indeed found evidence for a bump just below .05 in the distribution of exactly reported p-values in the journals Developmental Psychology, Journal of Applied Psychology, and Journal of Personality and Social Psychology, but the bump did not increase over the years and disappeared when using recalculated p-values. We found clear and direct evidence for the QRP "incorrect rounding of p-value" (John, Loewenstein & Prelec, 2012) in all psychology journals. Finally, we also investigated monotonic excess of p-values, an effect of certain QRPs that has been neglected in previous research, and developed two measures to detect this by modeling the distributions of statistically significant p-values. Using simulations and applying the two measures to the retrieved test results, we argue that, although one of the measures suggests the use of QRPs in psychology, it is difficult to draw general conclusions concerning QRPs based on modeling of p-value distributions.

Keywords: Caliper test; Data peeking; NHST; QRP; p-values.

Grants and funding

The preparation of this article was supported by Grants 406-13-050 (Robbie C.M. van Aert) and 016-125-385 (Jelte M. Wicherts) from the Netherlands Organization for Scientific Research (NWO). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.