In this video, we're going to talk about hypothesis testing. We're going to start with hypothesis testing via a confidence interval. We're then going to introduce formal hypothesis testing using p-values as well as talk about one and two-sided tests. Let's start with the hypotheses. First is the null hypothesis, usually denoted H0. This is often either a skeptical perspective or a claim to be tested. In the null hypothesis, we set the parameter of interest equal to some value. The second is the alternative hypothesis. Usually denoted as HA, this hypothesis represents an alternative claim under consideration, and is often represented by a range of possible parameter values. Therefore, in the alternative hypothesis, we claim that the parameter of interest is either less than, greater than, or not equal to the same null value from the null hypothesis. Note that in the hypothesis testing framework, the skeptic will not abandon the null hypothesis unless the evidence in favor of the alternative hypothesis is so strong that she rejects H0 in favor of HA. Earlier we calculated a 95% confidence interval for the average number of exclusive relationships college students have been in to be 2.7 to 3.7. Based on this confidence interval, do these data support the hypothesis that college students on average have been in more than three exclusive relationships? We start with setting our hypothesis. Our null hypothesis is that mu is equal to 3. College students have been in three exclusive relationships on average. Our alternative hypothesis is mu is greater than 3. College students have been in more than three exclusive relationships on average. Usually the notation is sufficient, so if you're doing some practice problems, you don't necessarily have to write it out in English, as well. However, at least try saying it to yourself. Because to do the assignment of what does mu mean, what is the parameter of interest is really important and is often overlooked. Our interval spans from 2.7 to 3.7, and the null value of 3 is actually included in the interval. The interval says that any value within it could conceivably be the true population mean. Therefore we cannot reject the null hypothesis in favor of the alternative. This is a quick and dirty approach for hypothesis testing. However, it doesn't tell us the likelihood of certain outcomes under the null hypothesis. In other words, it does not tell us the p-value. Based on which we can make decisions on the hypotheses. Before we proceed though, note that the hypothesis are always about the population parameters and never about the sample statics. We would never hypothesize about x bar in a hypothesis test, but we might hypothesize about mu, because we don't know what mu is, versus we know know exactly what x bar is. The p-value, as you remember, is the probability of observed or more extreme outcome, given that the null hypothesis is true. So in context of the data we've been working with, this is the probability of x bar being greater than 3.2, greater than the observed average, given that the true population mean is 3. That's coming from our null hypothesis. Since we are assuming null hypothesis to be true, we can use that to construct the sampling distribution based on the central limit theorem. We have x bar that's nearly normally distributed with mean 3 and standard error 0.246. We calculated the standard error before as well, and the 3 simply comes from the null hypothesis, since we're assuming that it is true. The next thing we want to do before we jump into calculating any sort of probabilities or p-values is to always draw a picture. So let's draw the curve and shade the area of interest corresponding to the p-value. The z score can be calculated as the observed sample mean minus the null value divided by the standard error of the sample mean, which comes out to be 0.81. This score actually has a special name, it's a test statistic, because it's the statistic that we're going to use to find the p-value for our hypothesis test. For now, we're sticking with z-scores, but later on in the course we're going to work with other distributions than the normal distribution and introduce other test statistics as well into our hypothesis testing framework. So our ultimate goal is still to find the shaded area, which basically corresponds to the p-value for this hypothesis test. This should be something that you've done numerous times at this point in the course. So if you're not sure how to find the shaded area, I would strongly recommend that you review some of the past lectures on normal distributions where we played with the applet, we used R, or we used a table to find the normal probabilities. In this case, the answer comes out to be 0.209, so the p-value for this test is 0.209. We just used the test statistic to calculate the p-value, which is the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if in fact, the null hypothesis was true. If the p-value is low, and by low we mean lower than the significance level alpha, which we usually set at 0.05, but we can certainly change it as well, we say that it would be very unlikely to observe the data if the null hypothesis were true, and therefore we reject the null hypothesis. If on the other hand the p-value is high, so some value higher than alpha, we say that it is indeed likely to observe the data even if the null hypothesis were true. And hence, we would not reject the null hypothesis. All of this information that we're seeing on this slide from summary statistics to the z-score to the p-value is stuff that we've seen previously in this video. What we want to do next though is to finally make a call. Our p-value is 0.209, and since it's high, at least, it's higher than 5%, we do not reject the null hypothesis. What does that mean in context of this question? Our null hypothesis was that college students on average have three exclusive relationships, versus, the alternative was that that number was something greater than 3. And in this case we fail to reject the null hypothesis and say that even though we observed a sample mean slightly above 3, there is not enough evidence to reject the null hypothesis that sets the population average of number of exclusive relationships college students have been in to 3. That was easy, but what does the p-value actually mean? If in fact college students have been in three exclusive relationships on average, that's the equivalent of saying if in fact the null hypothesis is true, there is a 21% chance that a random sample of 50 college students would still yield a sample mean of 3.2 or higher. Since this is a pretty high probability, we think that a sample mean of 3.2 or more exclusive relationships is likely to happen simply by chance. We talked briefly about how we make these decisions as well. Since our p-value is high, or in other words, higher than 5%, we fail to reject the null hypothesis. These data do not provide convincing evidence that college students have been in more than three relationships on average. And the difference between the null value of three relationships and the observed sample mean of 3.2 relationships is simply due to chance or sampling variability. Often, instead of looking for a divergence from the null hypothesis in a specific direction, so either a greater than or less than sign, we might be interested in divergence in any direction. We call such hypothesis tests two-sided, or you might also hear of them being referred to as two-tailed as well. The definition of a p-value is the same regardless of doing a one or a two-sided test. However, the calculation becomes slightly different and ever so slightly more complicated since we need to consider at least as extreme as the observed outcome in both directions away from the mean. If we actually wanted to do a two-sided hypothesis test with the existing data that we had, we would set our p-value now to be x bar greater than 3.2, or we want to consider the other direction as well, x bar less than 2.8, given that the null hypothesis says that the true population mean is 3. So we can draw our curve and shade both tails. How did we come up with 2.8? That's the same distance between 3 and 3.2, we just subtracted 0.2 from 3 and arrived at 2.8. So if you're looking for the cutoff values for these two-sided hypothesis tests, all you need to do is to travel the same distance away from the mean. To find this probability, we had already seen that the upper tail was 0.209. Since this is a symmetric distribution, the lower tail will also be 0.209. And therefore our p-value is simply going to be the probability on the upper tail plus the probability on the lower tail, which comes out to be just twice what we have in one tail, roughly 41.8%. To recap, how do we do hypothesis testing for a single mean? There are a bunch of steps that we've gone through, and it will be useful to collect them in one slide so that you can refer back to them later. First, we always set the hypotheses. Our null hypothesis sets a population parameter, in this case we're focusing on means only, so I'll give the examples about the means, so that's a mu, not an x bar, equal to some null value. And the alternative hypothesis either has a less than, greater than, or not equal to sign, and, again, compares the parameter of interest to the same null value that we have. The second step is to calculate the point estimate. If my parameter of interest is a population mean, my point estimate is going to be a sample mean. So in this case, if I had the raw data, I would be calculating x bar from it before I proceed further. Next, we check the conditions. Remember that this is a method that, once again, relies on the central limit theorem. So the conditions that we see here are going to be very similar to the conditions that we've seen with the central limit theorem or the confidence intervals we've done before. The first one is independence, so sampled observations must be independent of each other with respect to the variable of interest. If we have an observational study, we want a random sample. If we have an experiment, we want a random assignment. In addition, if we're sampling with that replacement, we want to make sure that our sample isn't too large compared to our population. So we want to leave it proportional at about 10%, so our sample size n should be less than 10% of our population. The second condition is about the sample size or skew. n needs to be greater or equal to 30 to be able to use the techniques that we've learned so far as is. We're going to talk about what happens with smaller samples later. And actually, n should probably be even larger if the population distribution is very skewed. So we always also want to try to take a look at a histogram, a box plot, a normal probability plot, whatever you can think of that would tell you something about the skewness of the data and might give us some insight into what type of a population it might be coming from. Next, we want to draw the sampling distribution, shade the p-value, and calculate the test statistic. Note that it's really, really important that you draw your distribution, because if you do and you shade your p-value, not only are you going to better understand why you're going through the mechanics that you're going through, but also it's going to be much less prone to errors. The test statistic of interest here is a z-score. And we calculate it as our point estimate, the sample mean, minus the null value, the population mean, divided by the standard error of our point estimate or standard error of x bar. And we know that based on the central limit theorem, that value is simply s over square root of n, or if you happen to have access, sigma over square root of n. Lastly, we want to make a decision and interpret it in context. That's really, really important. We want to interpret everything that we see in context of the data we're working with or our research question. If the p-value is small, we reject the null hypothesis and conclude that the data provide convincing evidence for the alternative. Note that we're not trying to look for evidence for the null hypothesis at any point. It's always about the alternative that we're interested in. The null hypothesis is what we get to either reject or get stuck with. If the p-value is large, you might indeed get stuck with your null hypothesis. You would fail to reject it and determine that the data do not provide convincing evidence for the alternative hypothesis.