So far in this course we've talked about making decisions on hypothesis based on p-values. If a p-value is small we get to reject the null hypothesis and we call such a results statistically significant result. In this video we're going to talk about what is statistical significance versus what is practical significance. Because while sometimes they go hand in hand, that is not always the case. Before we go too much more into the discussion of the concept, we're actually going to use a practice problem to illustrate what we're talking about. So think about two scenarios. All else held equal, meaning everything else is the same about these two scenarios however, in one scenario we have a sample size of 100 and in the other scenario we have a sample size of 1,000. The question is, will the p-value be lower if n is equal to 100 or n is equal to 10,000. You can think about this as two separate hypothesis tests. We have the same null hypothesis and the same alternative hypothesis. Our sample means are exactly the same. Our obviously null values are the same because they're driven by the hypotheses. And our standard deviations are exactly the same as well, but the sample sizes are different. So which one of these sample sizes is going to yield a lower p-value? All right so if the sample size is high, then what that's going to effect first and foremost is going to be the standard error. For the simple mean case, for example, we find the standard error as s divided by square root of n. So, if you increase your sample size, your standard error is going to shrink. The standard error within a hypothesis testing framework shows up in the denominator of our test statistic. When we're calculating the test statistic, for example, say the z statistic, we take our point estimate, say our sample mean, minus our null value. That comes from the hyp, null hypothesis. And we divide it my the standard error. So if I increase n, then we said that the standard error is going to go down. And if your denominator goes down, your test statistic is going to go up. The test statistic going up basically means that if you are thinking about the standard normal curve, your z scores are going to be closer to the ends of the tails as opposed to closer to the center. And if your cutoff values your z scores are actually closer to the end of your tails, then the p-values, which are those tail areas that you have, are going to be getting smaller and smaller. Meaning that if you increase n, standard error decreases. Our test statistic increases, which results in our p-values decreasing. So the answer here is going to be n is equal to 10,000. So we can also illustrate this point mathematically. Let's make up some data. We're going to say that our x bar is 50, our sample standard deviation is 2. Our null hypothesis is that mu is equal to 49.5, and the alternative hypothesis is that mu is greater than 49.5. The mu here that, the null value that I've chosen here is intentionally very close to the sample mean that we're using, and we're going to talk about that in a moment. So, if I want to calculate the z score when the sample size is equal to 100, my z score calculation would go as sample mean 50 minus the null value 49.5 divided by a standard error, which is 2, the standard deviation divided by square root of n. And I can work through the math of this, and the z score turns out to be 2.5. If on the other hand I'm calculating it for the sample size of 10,000, everything stays the same except for the calculation of the standard error. And going through the mathematical calculations over there gives me a z score of 25. So the first scenario we have a z score of 2.5. It's still not a small value, but at least it's close to the tails, but not all the way out there. Versus in the second scenario, our p-value is bound to be something very tiny, approximately zero. We never claimed that the p-value is equal to zero. But with a z score of 25, we know that using the 68, 95, 99.7% rule almost all of the observations under the normal curve lie within three standard deviations of the mean, so a z-score of 25 basically means almost no p-value. Or in other words, highly statistically significant finding. However, is it practically significant? When we're thinking about practical significance, we focus on the effect size. And remember, we define the effect size as the difference between your point estimate, and then your null value. So that would be in the calculation of the test statistic thinking about it as the numerator. In both instances we have the same exact effect size. And it's a small effect size to be fair as well. And even though we have a small effect size, which may not be practically significant. We are able to find the statistically significant result simply by inflating our sample size. And remember that the sample size is something the researcher has control over, because after all, you get to decide how many observations you want to sample. Sure there's going to be a bound based on how many, how much resources you have but at the end of the day that's the human controlled part of a study. So, when you see highly statistically significant results make sure that you have a critical eye and make sure that you also inquire whether the effect size is reported and what the sample size is as well. And not only should you inquire this stuff but if you are reporting these highly statistically significant results it's always a good idea to let your readers know of your effect size and your sample size so that it, the discussion is made clear if a statistically significant finding is also practically significant or not. So to summarize real differences between the point estimate and the null value are easier to detect with large samples. However very large samples will result in statistical significance even for tiny differences between the sample mean and the null value or our effect size, even when the difference is not practically significant. So in order to make sure that your findings don't suffer from this problem of being statistically significant, but not practically significant. Oftentimes what we do is we do some a priori analysis before you actually do the data collection to figure out, based on characteristics of the variable you are studying, how many observations to collect. So, it is highly recommended that researchers, either they should do this themselves, or consult with statisticians, but to figure out how many observations to sample before they actually go into doing that. Because the last thing you want to do is having to find out is, we have already put in the researches to collect some data, and you either don't have enough or you have too many observations. This brings to mind a quote from a famous statistician, R.A. Fisher. To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination. He may be able to say what the experiment died of.