[MUSIC] In this lecture, we will talk about likelihoods. Likelihoods are a way to express the relative evidence for one hypothesis over another hypothesis. They can be very useful in themselves. But they are also underlying a lot of Bayesian statistics, as we'll see in later lectures. A likelihood gives you the function of a parameter given the data. So when you've observed some data, you can plot the accompanying likelihood function, and you can check how likely each hypothesis that you might have is. In this example, we'll mainly focus on the binomial likelihood function. This is the easiest situation. It's like flipping a coin and then counting the number of heads that you have observed. This is the binomial likelihood function. I won't explain it in detail here, but there's an accompanying assignment in which we'll dive into the function and calculate some likelihoods for different observations of number of heads. For now, let's take a look at the situation where we flip a coin ten times. Eight of these ten times we find that the coin comes up heads. We can calculate the likelihood of theta, and theta in this case is a Greek letter that explains the true hypothesis. And in this case, we might assume that the true underlying number of heads that we will observe if we do this in the long run is 0.8. So for the theta of 0.8, we can calculate that the likelihood is 0.30. But we can also calculate the likelihoods for all sorts of different hypotheses. Let's say that you have the idea that the true value of the number of coin flips in the long run is 70% or a theta of 0.7. The likelihood for this being true, based on the data that we have observed - eight out of ten coin flips turned up heads. The likelihood of the true population effect being 0.7 is 0.23. The likelihood of a theta of 0.6 is 0.12. So you can see it becomes less and less likely the farther away you go from the observed data. Now these numbers are not very clear, so let's take a look at the likelihood function. In this graph, you can see the likelihood function plotted. On the vertical dimension of this graph, we have plotted the likelihood. And on the horizontal dimension, we have plotted theta. All different versions ranging from 0 to 1. So the range of 0 to 1 means that we will either always observe heads or we'll never observe heads. Now based on the data we have collected, we already know that it's not possible that we'll never observe heads or we'll never observe tails. Because eight out of ten turned out to be heads. So we've observed both values. So the extremes are not possible, and you can see that the likelihood here is 0 - it's very low. The most likely true population parameter of the number of coin flips based on the data that we have is 0.8, because this is exactly what we have observed. So this is the maximum likelihood. Based on the data that we have in our hands, we can see that 0.8 is the most likely true population parameter. But some of the values just around it are also still quite plausible. So a value of 0.6 is still quite probable in this case as well. These types of likelihood curves were invented by Ronald Fisher. I already said he was a really, really smart guy. And he invented these likelihoods when he was only 22 years old and actually a 3rd year undergraduate student; so this sort of testifies to his genius, I would say. Sometimes I wonder what have I been doing the last couple of years - definitely not inventing something like this that's a fundament of modern statistics. Now, we can use the likelihood under the null hypothesis and the likelihood under the alternative hypothesis to calculate something that's known as the likelihood ratio. So in the case of likelihood ratio, we're taking the relative evidence for the one hypothesis, the null, and the other hypothesis, the alternative, and we can calculate the odds of one over the other. Let's look at our example, where we had eight out of ten coin flips. Now we need to compare two hypotheses, and you're free to choose any two values of theta that you want. But one very logical value would be 50%, 0.5, the hypothesis that this is a perfectly balanced and fair coin. So this is the value that's plotted at 0.5 on the horizontal axis. And you can compare this with any hypothesis you had in advance. The alternative hypothesis I've chosen here is the value 0.8. This happens to be exactly the value that we've observed in the data. Eight out of ten coin flips. Now we're comparing this alternative hypothesis that a true population parameter is 0.8. So 80% of the time this coin will come up heads. And we'll compare it to the null hypothesis, the value of 0.5, the idea that this is a perfectly balanced, fair coin. Now we can already see in the graph that the likelihood is much higher at 0.8 than it is at 0.5. But what we want to know is, how much more likely? And the likelihood ratio gives us an idea of this. It's basically dividing the likelihood at 0.8 by the likelihood at 0.5. And in this case, this ratio is 6.87. So it's quite more likely that the true population parameter is 0.8 compared to the idea that the population parameter is 0.5. Now, we can test any values that we want. Let's say that in advance we still had the same alternative hypothesis that this was a coin that would come up heads 80% of the time. We still compare it to the probability that this is a fair coin. So these two hypotheses are being compared, but now we have observed data where the coin comes up heads four times out of ten flips. So we can see that the shape of the likelihood function is very different, and in this case of course, based on the observed data, the likelihood of 0.4 is most likely. We can still compare the ratio of the likelihoods at 0.5 and 0.8, and now we see that the value at 0.5 is much more likely than the value at 0.8. So the hypothesis that this is a fair coin is now much more likely than the hypothesis that this is an unfair coin, and that the true population parameter is 0.8. Now when you calculate likelihood ratios, there are two cutoffs of 8 and 32 of these likelihood ratios, which are considered either moderately strong evidence or strong evidence. So, in the previous examples, we could see that in the first case, we had not yet moderately strong evidence for the difference in the hypotheses. In the latter case, when we observed four out of ten flips coming up heads, we can see that this was very strong evidence for the fair coin hypothesis compared to the alternative - that this was a coin with 0.8 coin flips coming up heads. So you see that likelihoods are relative evidence for the alternative hypothesis compared to the null hypothesis. It's important to realize that both of these hypotheses might be quite unlikely. Let's look at an example. Now here we flip the coin 100 times, and 50 out of the 100 times, the coin came up heads. Now, we are comparing two hypotheses here: the one being that the true population parameter is 30% heads in the long run, so a theta of 0.3. The alternative hypothesis is that the true population parameter is at theta of 0.8, so we'll see 80% heads in the long run. When we compare these two hypotheses against each other, we can see that the hypothesis that the true population parameter is 0.3 is massively more likely than the likelihood that the true population parameter is 0.8. And we find a ridiculously high likelihood ratio. However, we can also see that both these hypotheses are wrong. So even though the relative evidence for the one hypothesis compared to the other hypothesis is extremely convincing, we didn't find the true population parameter, which is actually just 0.5 in this case. So it's important to keep in mind that likelihood ratios are relative evidence. Now we can compare the likelihood under the null hypothesis of a theta of 0.05 and an alternative hypothesis of 0.8. Now, one way in which you can think about applying these likelihood ratios is when you think about the outcomes of studies. Now, if you perform a study, it's sort of like a coin flip. You can either find a statistically significant result, or not. So this is one of two options, and it's a binomial probability. So, how often will you find a significant result, and how often will you find a non-significant result? Well, it depends on whether the null hypothesis is true or the alternative hypothesis is true. When the null hypothesis is true, we know that you'll find a significant result that equals your alpha level, which is typically 0.05%. So this is a very good null hypothesis in these likelihood ratios. We can compare this with the probability of finding significant results when there is a true effect. And in this case, this probability depends on the statistical power that you have in your test. Let's for now put it at 0.8, so 80% of the time you'll find a significant result because you had 80% power. Now we can use the likelihood function to compare the probability that we have observed a significant result under the null hypothesis - and compare it with the alternative hypothesis. Let's say that we perform three studies in a row. Now, there are a number of outcomes that could happen, of course. None of these studies could come up significant. You could find only one significant effect, or two. Or, if you're really lucky, three significant results out of three studies. Now, let's calculate the likelihood of finding two out of three significant results. When the null hypothesis is true, this is pretty straightforward. The probability of finding a significant effect when there's no true effect is the alpha level, so this is 5% of the time. We multiply it by two significant studies. So 5% times 5%. And then we multiply it by the probability of finding no significant effect. Well, if we have a 5% alpha level, then the probability of finding no significant effect - if there's no true effect - is actually 95%. So if you multiply these probabilities, 5% times 5% times 95%, we get the value of 0.0024. All right, so what happens if there is a true effect to be observed, and we find two out of three significant findings? Now, let's assume that we have 80% power. You never really know, but you can check different levels of power in the likelihood function when it's plotted. But for now, let's take a look at 80%. So, we find two significant results and one non-significant result. If the alternative hypothesis is true, this equals an 80% probability of finding a significant result, times an 80% probability of finding a significant result, times a 20% probability of making a Type II error. And if we now multiply these probabilities, we get 0.128. So we can calculate the relative likelihood of observing these data - two out of three significant results - when either the null hypothesis is true, or the alternative hypothesis is true. Now based on these numbers, you can already see that this outcome is much more likely when there is a true effect than when there is no true effect. So, if we calculate likelihood ratio, we see that it's actually 54 times more likely that the alternative hypothesis is true than that the null hypothesis is true. It's possible to find this pattern of results when there is no true effect. But it's massively more probable that you'll find this pattern of results when there is a true effect. We can plot this likelihood function - because you might not be convinced that 80% power is very likely in this case. Now, if you look at this graph, we can again see the likelihood function which is plotted for two out of three significant results. I've indicated two points: the 0.05 theta on the left and the 0.8 theta on the right. Now, the 0.05 is the Type I error rate, or your alpha, so that's pretty fixed. But maybe you feel like you might have had lower power in your study. And you can look at the relative likelihood of the outcome of two out of three studies for any value of theta that you like. The most probable situation is, of course, the theta for exactly the result that we have observed: two out of three significant results, meaning 0.666. For any of these values, you can see that as long as you assume a decent level of power, something higher than 0.5, it's much more likely that there is a true effect than that there is no true effect, even though we only observed two out of three statistically significant results. Now, this is an important realization that you can get by looking at these likelihood functions. Multiple studies should give mixed results when the alternative hypothesis is true. And even when you have a decent level of power, this will happen quite often. What's the likelihood of finding three significant results in a row when you only have 80% power? I'm saying only, but that's actually a pretty decent and recommended level of power, right? So we can do the math - it's quite easy - 0.8 times 0.8 times 0.8 gives you a likelihood of 0.51. So about 50% of the time, when you have 80% power and the alternative hypothesis is true, you'll find three significant results in a row. It means that almost 50% of the time, you will get some version of mixed results. Either two out of three, one out of three, or if you're unlucky, even zero out of three. Even though there might be a true effect. Now, this won't happen so often, but all these outcomes are probable. And the probability of mixed result is almost 50% of the time. So this is important to realize: when you're doing lines of research, you will not find consistently significant results in all of your studies, and this is perfectly fine. This is exactly what you should expect. Now, in this graph I've plotted a lot of lines that are relevant for a situation where we perform four studies. When we perform four studies, we can either observe zero significant results, or one, or two, or three, or four. So each of these curves plots one of these likelihood functions, given the data that we've observed. Now what I want to point out here is, again, the massive probability of finding mixed results. And we can see this for all the curves for one, two, three, or four out of four results, where the likelihood is highest for mixed results. And that's actually this area. So when we perform four studies, and the theta, which means the power that we have, is somewhere between 0.2 and 0.8, so between 20% power and 80% power. Then finding mixed results is actually the most probable outcome of a line of four studies. Now we've seen that likelihood ratios allow you to express the relative evidence for the null hypothesis compared to the alternative hypothesis. And this can be very useful in itself. For example, we applied this to the probability of finding significant or non-significant results in a set of studies. And we've seen that it's actually quite likely that you'll observe a mix of significant and non-significant results, even when the alternative hypothesis is true. Likelihoods also underlie Bayesian statistics. So knowing likelihoods is also a good stepping stone to move on to Bayesian statistics, which we'll do in future lectures. [MUSIC]