Conference Object

Comparison of the performance of methods to assess publication bias in real data

Author(s) / Creator(s)

Niemeyer, Helen
van Aert, Robbie C. M.
Schmid, Sebastian
Uelsmann, Dominik
Knaevelsrud, Christine
Schulte-Herbrueggen, Olaf

Abstract / Description

Background: Publication bias is widespread within many scientific disciplines and also within psychology. If only published studies are included in a meta-analysis of psychotherapy research, the efficacy of interventions may be overestimated. However, the treatments in evidence-based psychotherapy are mainly selected based on published rather than unpublished research. The presence and impact of publication bias in psychotherapy research remains largely unknown. Posttraumatic stress disorder (PTSD) is a highly distressing and common condition, and various forms of psychological interventions for treating PTSD have been investigated in a large number of studies. A comprehensive statistical assessment of publication bias in meta-analyses of psychotherapeutic treatments for PTSD has not been conducted, even though a considerable number of statistical methods to investigate the presence and impact of publication bias have been developed in recent years. Objectives: We compare the performance of six state-of-the-art publication bias methods on a large-scale data set by re-analyzing all meta-analyses which investigate the efficacy of psychotherapeutic interventions for PTSD. Research question: We aim at investigating the amount of publication bias in all meta-analyses on the efficacy of psychotherapeutic treatment for PTSD and to compare the performance of methods to assess publication bias. A comparison on real data is not as straightforward as in simulation studies, since the true amount of publication bias is unknown in real data. Hence, the performance of the methods will be examined by comparing them to the other included methods. Method: We screened the databases PsycINFO, Psyndex, PubMed, and the Cochrane Database for all published and unpublished meta-analyses in English or German up to 5th September 2015. In addition, a snowball search system was used. Meta-analyses were required to meet the following inclusion criteria: 1) A psychotherapeutic intervention was evaluated. 2) The intervention aimed at reducing subclinical or clinical PTSD. 3) A summary effect size was provided. We included only data sets where the null hypothesis of homogeneous true effect size was not rejected, because the statistical methods become biased if the true effect sizes are heterogeneous. This hypothesis was tested by means of the Q-test and quantified by I2. Moreover, we excluded all data sets of a meta-analysis that included five or fewer trials, as the methods to detect publication bias are underpowered if the number of studies is too small. We included the following methods to test whether publication bias was present in a meta-analysis: Egger’s regression test, rank-correlation test, TES, and p-uniform’s publication bias test. Four different methods were included to estimate the effect size and test the null hypothesis of no effect: traditional meta-analysis, trim and fill, PET-PEESE, and p-uniform. The degree of agreement among the methods was examined by means of Loevinger’s H, because the publication bias tests and tests of the null hypothesis of no effect resulted in a dichotomous decision (statistically significant or not). Results: The literature search resulted in 7,647 hits including duplicates. The screening process reduced this number to 502 meta-analyses, of which 83 dealt with the efficacy of psychotherapeutic interventions for PTSD and were included. The meta-analyses included a total number of 2,131 data sets, of which 93 data sets from 24 meta-analyses fulfilled all inclusion criteria. They included a median number of 7 studies. Since publication bias tests have low statistical power if the number of effect sizes is small in a meta-analysis, the characteristics of many of the data sets are not well-suited for methods of detecting publication bias. The median number of statistically significant effect sizes in the data sets was 3. Of all methods Egger’s regression test detected publication bias the most, i.e. in 17 data sets (18.3%). At most two methods detected publication bias test in the same data set, which occurred in 4 data sets (4.3%). Loevinger’s H varied between -.075 and 1. For the test of no effect, Loevingers´s H varied between .668 and 1. When estimating effect sizes corrected for publication bias, results show that especially estimates of PET-PEESE were closer to zero than traditional meta-analysis and that the standard deviation of the estimates of PET-PEESE and p-uniform was larger than traditional meta-analysis and trim and fill. The mean of the difference in effect size estimate between PET-PEESE and the traditional meta-analytic estimate was -0.108 (SD = 0.886). P-uniform was applied to a subset of 72 data sets, because at least one study in a data set has to be statistically significant for this method. The mean of the difference in effect size estimate of p-uniform and traditional meta-analysis was 0.002 (SD = 0.355). Estimates of PET-PEESE were especially unrealistic if there was a small number of effect sizes in a data set in combination with small variation in the standard errors of the primary studies. P-uniform’s estimates were unrealistically large or small when a small number of statistically significant effect sizes were observed with p-values just below the α-level. Conclusions: Our study is the first to apply a multitude of publication bias methods to a large-scale real data set. Publication bias tests did not result in the same conclusion in the majority of the data sets which is unlikely if extreme publication bias was present. No clear indications for overestimated effect sizes were observed when comparing effect size estimates of traditional meta-analysis with methods to correct for publication bias. However, the assessments of publication bias in psychotherapy research may have lacked statistical power to detect publication bias. Moreover, the conclusion regarding statistical significance of the test of no effect often changed when correcting for publication bias with PET-PEESE and p-uniform compared to traditional meta-analysis. This is at least partly caused by the less precise effect size estimates of these methods since the effect size estimates corrected for publication bias did not provide strong evidence for overestimation caused by publication bias. Future research is needed to study the convergence and divergence of publication bias tests as a function of publication bias and the number of primary studies in a meta-analysis.

Persistent Identifier

Date of first publication

2019-03-14

Is part of

Open Science 2019, Trier, Germany

Publisher

ZPID (Leibniz Institute for Psychology Information)

Citation

Niemeyer, H., Van Aert, R. C. M., Schmid, S., Uelsmann, D., Knaevelsrud, C., & Schulte-Herbrueggen, O. (2019, March 14). Comparison of the performance of methods to assess publication bias in real data. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2401
  • Author(s) / Creator(s)
    Niemeyer, Helen
  • Author(s) / Creator(s)
    van Aert, Robbie C. M.
  • Author(s) / Creator(s)
    Schmid, Sebastian
  • Author(s) / Creator(s)
    Uelsmann, Dominik
  • Author(s) / Creator(s)
    Knaevelsrud, Christine
  • Author(s) / Creator(s)
    Schulte-Herbrueggen, Olaf
  • PsychArchives acquisition timestamp
    2019-04-03T13:11:56Z
  • Made available on
    2019-04-03T13:11:56Z
  • Date of first publication
    2019-03-14
  • Abstract / Description
    Background: Publication bias is widespread within many scientific disciplines and also within psychology. If only published studies are included in a meta-analysis of psychotherapy research, the efficacy of interventions may be overestimated. However, the treatments in evidence-based psychotherapy are mainly selected based on published rather than unpublished research. The presence and impact of publication bias in psychotherapy research remains largely unknown. Posttraumatic stress disorder (PTSD) is a highly distressing and common condition, and various forms of psychological interventions for treating PTSD have been investigated in a large number of studies. A comprehensive statistical assessment of publication bias in meta-analyses of psychotherapeutic treatments for PTSD has not been conducted, even though a considerable number of statistical methods to investigate the presence and impact of publication bias have been developed in recent years. Objectives: We compare the performance of six state-of-the-art publication bias methods on a large-scale data set by re-analyzing all meta-analyses which investigate the efficacy of psychotherapeutic interventions for PTSD. Research question: We aim at investigating the amount of publication bias in all meta-analyses on the efficacy of psychotherapeutic treatment for PTSD and to compare the performance of methods to assess publication bias. A comparison on real data is not as straightforward as in simulation studies, since the true amount of publication bias is unknown in real data. Hence, the performance of the methods will be examined by comparing them to the other included methods. Method: We screened the databases PsycINFO, Psyndex, PubMed, and the Cochrane Database for all published and unpublished meta-analyses in English or German up to 5th September 2015. In addition, a snowball search system was used. Meta-analyses were required to meet the following inclusion criteria: 1) A psychotherapeutic intervention was evaluated. 2) The intervention aimed at reducing subclinical or clinical PTSD. 3) A summary effect size was provided. We included only data sets where the null hypothesis of homogeneous true effect size was not rejected, because the statistical methods become biased if the true effect sizes are heterogeneous. This hypothesis was tested by means of the Q-test and quantified by I2. Moreover, we excluded all data sets of a meta-analysis that included five or fewer trials, as the methods to detect publication bias are underpowered if the number of studies is too small. We included the following methods to test whether publication bias was present in a meta-analysis: Egger’s regression test, rank-correlation test, TES, and p-uniform’s publication bias test. Four different methods were included to estimate the effect size and test the null hypothesis of no effect: traditional meta-analysis, trim and fill, PET-PEESE, and p-uniform. The degree of agreement among the methods was examined by means of Loevinger’s H, because the publication bias tests and tests of the null hypothesis of no effect resulted in a dichotomous decision (statistically significant or not). Results: The literature search resulted in 7,647 hits including duplicates. The screening process reduced this number to 502 meta-analyses, of which 83 dealt with the efficacy of psychotherapeutic interventions for PTSD and were included. The meta-analyses included a total number of 2,131 data sets, of which 93 data sets from 24 meta-analyses fulfilled all inclusion criteria. They included a median number of 7 studies. Since publication bias tests have low statistical power if the number of effect sizes is small in a meta-analysis, the characteristics of many of the data sets are not well-suited for methods of detecting publication bias. The median number of statistically significant effect sizes in the data sets was 3. Of all methods Egger’s regression test detected publication bias the most, i.e. in 17 data sets (18.3%). At most two methods detected publication bias test in the same data set, which occurred in 4 data sets (4.3%). Loevinger’s H varied between -.075 and 1. For the test of no effect, Loevingers´s H varied between .668 and 1. When estimating effect sizes corrected for publication bias, results show that especially estimates of PET-PEESE were closer to zero than traditional meta-analysis and that the standard deviation of the estimates of PET-PEESE and p-uniform was larger than traditional meta-analysis and trim and fill. The mean of the difference in effect size estimate between PET-PEESE and the traditional meta-analytic estimate was -0.108 (SD = 0.886). P-uniform was applied to a subset of 72 data sets, because at least one study in a data set has to be statistically significant for this method. The mean of the difference in effect size estimate of p-uniform and traditional meta-analysis was 0.002 (SD = 0.355). Estimates of PET-PEESE were especially unrealistic if there was a small number of effect sizes in a data set in combination with small variation in the standard errors of the primary studies. P-uniform’s estimates were unrealistically large or small when a small number of statistically significant effect sizes were observed with p-values just below the α-level. Conclusions: Our study is the first to apply a multitude of publication bias methods to a large-scale real data set. Publication bias tests did not result in the same conclusion in the majority of the data sets which is unlikely if extreme publication bias was present. No clear indications for overestimated effect sizes were observed when comparing effect size estimates of traditional meta-analysis with methods to correct for publication bias. However, the assessments of publication bias in psychotherapy research may have lacked statistical power to detect publication bias. Moreover, the conclusion regarding statistical significance of the test of no effect often changed when correcting for publication bias with PET-PEESE and p-uniform compared to traditional meta-analysis. This is at least partly caused by the less precise effect size estimates of these methods since the effect size estimates corrected for publication bias did not provide strong evidence for overestimation caused by publication bias. Future research is needed to study the convergence and divergence of publication bias tests as a function of publication bias and the number of primary studies in a meta-analysis.
    en_US
  • Citation
    Niemeyer, H., Van Aert, R. C. M., Schmid, S., Uelsmann, D., Knaevelsrud, C., & Schulte-Herbrueggen, O. (2019, March 14). Comparison of the performance of methods to assess publication bias in real data. ZPID (Leibniz Institute for Psychology Information). https://doi.org/10.23668/psycharchives.2401
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/2033
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.2401
  • Language of content
    eng
    en_US
  • Publisher
    ZPID (Leibniz Institute for Psychology Information)
    en_US
  • Is part of
    Open Science 2019, Trier, Germany
    en_US
  • Dewey Decimal Classification number(s)
    150
  • Title
    Comparison of the performance of methods to assess publication bias in real data
    en_US
  • DRO type
    conferenceObject
    en_US
  • Visible tag(s)
    ZPID Conferences and Workshops