[MUSIC] In this lecture we'll talk about publication bias. You would hope that researchers publish all data that they collect regardless of whether their results are statistically significant and supporting their hypothesis or not. It turns out that this is not the case and if you do research, it's important to recognize publication bias in the literature. Now one of the biggest problems of the over reliance on p-values in the scientific literature, is their use as a threshold to publish. People think that if a p-value is smaller than 0.05, the results are worthwhile to be published. But if the result is higher than 0.05, they often do not submit these papers for publication. As a consequence, if we plot the p-value distribution in the scientific literature, we can see that there's sort of a curve going on as we explained in previous lectures. You can see it here by the red line, the red curve is illustrating this. And then at the threshold of 0.05, the vertical red line, we can see a huge drop. You might think that this curve should continue on, and it should, but what's indicated here is a large missing section of the scientific literature. It's like a big monster came by and bit part out of all the p-values that should be there. So this missing part is missing due to publication bias. There are studies but we just don't have access to it. In the scientific literature, this is known as the file drawer problem. If people perform a study and they observe a non-significant result, a p-value higher than 0.05, they're less likely to submit this result for publication. Instead, they store the data in their file drawer. Nowadays, this probably would be a folder on your computer somewhere. Now, as a consequence, if you look at the published literature, most of the published results confirm the hypothesis. So, in psychology, more than 90% of published papers support the hypothesis that they set out to test. Some people have even joked that if we're so good at predicting what will happen, why do we bother collecting data anyway? If 90% of the studies that we perform support the hypothesis, then let's just skip the data collection and assume that whenever we have a good idea it actually is supported by the data. Now, of course, this can't be the truth, this is not what's going on. We just don't have access to all the non-significant results and all the results that do not support the hypothesis. There's some reasons why people are hesitant to submit non-significant results for publication. Null-results are difficult to interpret. It might be that there's just no effect. It's also possible that the study you designed simply wasn't very good. Maybe there is a true effect, but you need to study it in a better way. You need to change your study design. Now, we see that people might try again and again, and of course, if you take this to the extreme and you only submit for publication the one significant result out of 20 different tries to observe something. In this extreme situation, you're only publishing null results, false positives. Now there is random variation in studies we perform. We know this. This is a good example of what it can lead to. If we publish results, then sometimes there will be an extreme finding that shows a very strong positive effect. Sometimes there will be a finding that shows a very strong negative effect. As long as we publish all available evidence, we can do a meta analysis that says, on average there's really nothing going on here. But because we only publish these extreme findings, significant results in either direction, you end up with a scientific literature where everything both causes cancer and prevents cancer. So, let's say that you study the amount of wine that you drink and the likelihood that you will get cancer. There will be a study that says, there's a strong positive association. If you drink more wine you are more likely to get cancer. There will also be studies that say, if you drink more wine you're less likely to get cancer. All in all, there's just nothing going on, but people are just publishing flukes, false positives, on either side. And if we would have access to the entire literature which you could just say this is just a null effect, there's just random variation around zero. Now, as long as a research area doesn't share all results, it's not a quantitative science. This is a strong statement, but really, if you want to express the evidence that's in the data, you need to have access to all the data. If you only have a subset of all the available evidence, you cannot make a good statement about what's going on. So if you want to be a quantitative science, you need to publish all data. There are some recent initiatives where people try to convince researchers to empty their file drawers. These are online databases where people can report previously unpublished results, often non-significant findings, so that the scientific community has access to all the data that has been collected. Now it's important to realize that there can be over 200 published studies with significant p-values without there being a true effect. As long as there are thousands of people examining the same research question, and all these researchers perform multiple studies a year, then after a couple of years we'll have ended up with a literature that contains exclusively false positives. Of course, this is a very extreme situation but it might happen. Now publication bias cannot be corrected in a meta analysis, but it can be detected. So you can see that it's present and as a consequence, whenever publication bias is present in a meta analysis, be more careful in interpreting the meta analytic effect size estimate. In this graph, we see what's known as a funnel plot. We have simulated a large number of studies where the two effect size of a Cohen's d of 0.66. We see a vertical dotted line at the true effect size and all the individual studies fall more or less around this true effect. On the vertical axis, we see the standard error. You can think of this as the sample size. The further we go up in the graph, the bigger the sample size becomes and at the bottom we have studies with very small sample sizes. Now, the gray pyramids indicates the area within which studies are not statistically significant. If a dot falls in the white area it will be statistically significant, p-value smaller than 0.05. But if it falls within the gray pyramid, it's not statistically significant. Now in these simulated studies, the pattern is exactly as you would expect if there's a true effect and no publication bias. But let's take a look at the meta analysis where the meta analytic effect size was exactly the same but where publication bias is present. This data comes from a real meta analysis by Hagger et al, 2010, examining ego depletion. We see that the dots are distributed in a peculiar way. They fall very close on the edge of the gray pyramid. It's almost like people were throwing these dots from the side against the gray pyramid and they stuck exactly on the border. Of course, there is still some variation around it but there is really a surprising number of p-values or studies very close to this border. This is an illustration of publication bias. There's some selection going on. We're missing studies that should fall within the gray pyramid and a lot of these studies are just only on the border of the funnel plot. Now as I said, we cannot correct for bias but we can detect publication bias. One of the tests that we can do is a trim and fill analysis. The basic idea here is that there are certain studies that are missing and we can fill in these missing studies based on some assumptions. So in this case, we see a lot of open circles which are studies that are assumed to exist and are filled in in the existing meta analysis. Now trim and fill analysis will give you an adjusted effect size. But it's very tricky to interpret this. It's probably not correct, the adjustment might not be enough, so you can use the trim and fill analysis to say look. There is bias in this meta analysis, but you cannot use the adjusted effect size estimate as an indication of what the true effect size is if there would not have been biased. Another way to take a look at what the true effect size removing any bias might be is by only taking a look at the biggest studies. This is called a cumulative meta analysis. You start with the biggest study that you have available in your set of studies in the meta analysis, and you calculate the effect size and then you add the second biggest, the third biggest etc. And you only look at the effect size estimate after adding a number of really large studies. Now, in this plot, you can see that there is a slightly changing effect size estimate. It becomes bigger, and bigger, and bigger, as the studies become smaller. Now, this is problematic. Ideally, we would see that the largest studies all, sort of, center around the true effect size. But, even in this case, in this literature of Eagle depletion, the largest studies already contain some bias. Another technique you see in the published literature to detect bias, is known as Failsafe N. It calculates how many studies you need, with no true effect, before the meta-analytic effect size is reduced to zero. It's a little bit of a peculiar measure and even the person who originally came up with it, no longer recommends people to use it. So don't use this in your own meta analysis. More fruitful approach might be meta-regression. In meta-regression you draw a regression line through the studies in your meta analysis. If you think about the funnel plot we saw earlier, it's basically a regression line where instead of each individual being a data point, now each study is an individual data point. You can use the end point of the regression line where the standard error is zero to estimate the true effect size. Even though bias is present. Now, use these techniques with care but techniques such as Egger's regression might give you an indication of the true effect size after correcting for bias. Of course, we already talked about p-curve analysis. You can compliment your traditional meta analysis and the bias detecting techniques such as meta-regression or trim and fill with a p-curve analysis. If the p-curve analysis says that there is evidential failure, and the bias correcting techniques all show that after correcting for bias there's reason to assume there's still a true effect, then you might conclude that the available evidence is quite reliable and supporting the hypothesis. Let's assume you perform a set of four studies. Each study has 80% power which is quite reasonable. Then in this case, the probability of observing exclusively significant results is 0.8 x 0.8 x 0.8 x 0.8 which is 41%. So if you perform four studies with 80% power in each study, the probability of finding exclusively significant results is only 41%. So most four study papers with 80% power should have at least one or two non significant results. So given that non significant studies should be expected, why don't you just publish them when they happen? One way to publish results regardless of their significance level is of course if you preregister them. If you decide that the study is worthwhile to do and the reviewers agree, then they will give you an in principle acceptance after reviewing your study design. If you follow up with the plan and you perform the study as detailed, then you can publish the results regardless of the significance level and prevent publication bias. There are now some journals that accept pre-registered reports. Comprehensive Results in Social Psychology is one. Cortex is another journal that very early started with accepting preregistered reports. And if you're collecting data that's very difficult to collect, it's a lot of hard work, and you really want this data to reach the scientific literature you can consider submitting to these journals. Now if you perform a set of studies and you observe some non-significant results that you decide are not worthwhile to really discuss in detail in the methods and results section, then at the very least discuss these results in the discussion. This is really the place to say, these are some other studies that we've tried. We did not observe anything, but we think we have very good reasons why we're ignoring this data because whatever the reason is. If you've discussed them, share the data, make it available online. Let people draw their own conclusions. In this lecture, we've talked about publication bias and how to detect it. I really think that publication bias is one of the biggest challenges that the scientific community needs to address in the next few years. As long as there is strong publication bias, it's impossible to have a quantitative science, so if we decide to fix one thing in science it should really be this. [MUSIC]