[SOUND] You might have heard about Volkswagen intentionally creating tests that misrepresented the emission from its diesel cars. However, you might have not known how they were caught. A researcher at West Virginia University had a grant to study fuel diesel cars. This included diesel cars from VW among others, like BMW. However, the sample from Volkswagen would not give emissions that would fall within the confidence interval. And this caught the attention of the researchers. And when they repeated their tests and were not able to replicate VW's result, is when it became apparent that VW must have misrepresented their results. Going against a giant car company is never easy, but the statistics used was not something they could ignore. That's when they went to the United States Environmental Protection Agency and the rest, as they say, is history. This is not the only time that something like this has happened. Just recently, another car company, Mitsubishi Motors, admitted to falsifying fuel economy tests. Finding that a claim, or a finding is not true is not limited to auto industries. Lance Armstrong was a celebrated cyclist who had won several consecutive Tour de France. Until, finally, there was a blood test that could detect banned substances in his blood samples. Which, finally, resulted in getting him banned from competing in all sports, and getting stripped of all of his achievements. Of course, he's not the only sports figure to have been found guilty of doping. More disturbing is when we see falsification of our drugs, the efficacy or side-effects. Like where falsified data was used in Alzheimer's disease study involving major pharmaceutical firms. Or maybe, someone you know or even yourself had done some kind of a medical test and the results of the test eventually turned out to be wrong. So what do all these examples have in common? Well, being that we're learning about statistical models, it must be something about statistics. That's your first clue. Second clue? Well, that's the title of this module. All of these examples illustrate the use of hypothesis testing, which builds on what you learn in the first course. And this an extension of the confidence interval. And thus, falls under inferential statistics. Hypothesis testing is an important step in every scientific study. In every scientific study, you start by, identify a problem. Study the problem. Literature search or some preliminary testing, etc. Formulate a hypothesis. Conduct an experiment. Analyze the results. And reach a conclusion. Our focus in this module will be on how to formulate a hypothesis, analyzing the results, and making a conclusion. The hypothesis testing process begins with formulating the hypothesis. So let's begin with that. There are two types of hypotheses. Null hypothesis, and alternate hypothesis. Together, they will cover all possibilities. Null hypothesis, denoted by H0, is what you expect to happen. For example, the researcher in West Virginia University was expecting to get the results that VW had reported. The alternate hypothesis, denoted by Ha or H1, will be everything else that can happen. In VW's case the emission were higher than was reported by the car's manufacturer. So once you formulate the null hypothesis, the alternate is just its complement. Null and alternate hypothesis must be mutually exclusive. Both cannot be true and collectively cover all possibilities. Null hypothesis is a belief that we try to reject using sample evidence. One issue that's getting attention in the United States is the cost of higher education. One recent article in Wall Street Journal claims that the average debt for the class of 2015 to be $35,000. Many colleges are trying to reduce this burden by offering financial aid to their students. And they may want to highlight to prospective students that their students graduate with less debt than the national average. But first this belief that my college does better, must be tested. One way is to conduct a hypothesis test to check whether the graduates of university like Good Deal University are better off or not. So we have two claims here. Claim one is that the graduates of this university, Good Deal University, are not any different than the rest of all graduates. So the average debt for these graduates is also $35,000. Basically there is no reason to believe that we have a different debt coming out of this particular university versus any other. So this is the current belief, and thus it's our null hypothesis. Claim two is what the Good Deal University is making. This claim needs to be tested against a null hypothesis, and thus it is the alternate hypothesis. To debunk the null hypothesis, Good Deal University needs to provide data which debunks the current belief in favor of what they believe is the case for their graduates. Now let's see how we show these two competing claims in notation. To show you how this is done, it may be easier to start with the alternate hypothesis, which is, the average debt is less than $35,000. Now recall that the known hypothesis and the alternate hypothesis must be mutually exclusive and collectively cover all possibilities. This means that the null hypothesis must be stated as, the average debt is $35,000 or more. Imagine this scenario. A chocolate manufacturing plant is producing bags of chocolate that are labeled at 300 grams. The quality control takes samples of 50 bags to check for accuracy. This is a typical quality control example happening across industries. Engineers provide specifications, and the job of the quality engineer is to make sure that the specs are met. So, every time quality control checks are done, you're conducting hypothesis testing. Here, our current belief will be that the bags are filled correctly. And the alternate would be that they're not. We have a one-tail test, when the null hypothesis allows any value of a parameter larger or smaller than a specified value. Like we did in case of student debt, when we said, average was more than $35,000. In this case, the null hypothesis allows for $35,000 or any value greater than $35,000. Two-tail test, on the other hand, has a null hypothesis that states a specific value for the population parameter, as was the example for what the bags weighed is set at 300 grams. Let's do this example. We believe that the freshman entering our university has an average ACT score of more than 24. That's our current belief. Has it remained the same or has this changed? If you look here, the current belief is that mu, the population average, is greater than 24. However, since the equality is missing, then we will make this our ultimate hypothesis, and the node will be H0 is that mu is less than or equal to 24. This is because we need to have the equal sign to appear in the null hypothesis. This is a necessary part of formulating the null and alternate hypotheses. When you're testing the mean, then you have three possible sets of hypotheses. One is in the form or a two-tail test, which is a mean is equal to some hypothesized mean, or it's not. The other one is one of the two forms of one-tail test. Because the equals sign is always in the null hypothesis, most statistical software programs will only ask for the formulation of alternate hypothesis. Only with three choices, being not equal, less than, and greater than. Based on what you pick, the null will be inferred. Sometimes people get lazy and write the hypothesis like this, but again, this is technically incorrect. But it is done out of laziness, because based on the alternate, and the fact that the null and the alternate cover all possibilities, everyone knows that a one-tail test will be conducted. So, now let's practice. Consider the following. A company's website promotes to it's potential job applicants that their employers are so happy that on average they stayed with the company for 25 years. This claim has been made for many years. The company has hired a new HR director. The new HR director wants to make sure that the claim is still true. What is the null and the alternate hypothesis here? This is a test about the population mean. The claim is that the average length of tenure is 25 years. And the alternate is that the average has changed. Thus, this is a hypothesis test about the population mean, and it's two-tailed test. Now, let's look at another example. A manager's considering moving her retail store to a new location where it is believed that similar stores to hers have higher customer traffic. The monthly increase in rent would mean that at least her customer traffic must increase by 20% should she move to this location. Clearly, if the customer traffic is not increased, then the probability of more sales will not be there to offset the additional cost. In this example, the question is about the population proportion. Stating the null and the alternate hypotheses for population proportion follows the same principles as we followed for the population mean. If p represents the percent increase in customer traffic, then this store owner is going to test the belief that the proportion of customer traffic will go up at least by 20%, which means the alternate is that the proportion is below 20%. Remember that the owner will decide against this move only if the customer traffic increases by less than 20%. This is another example of one-tail test. You may find it easier to state the alternate hypothesis first and then based on that, write the null. So here's another example for population proportion. A nutrition supplement claims that 80% of its customers report improved sense of physical well-being. A newspaper investigator is wondering if that is so or not. So based on the claim we can say that p represents the proportion of the population using the supplement who reported an improved sense of physical well-being. Based on this, one could write the null hypothesis as p = 0.8. And the alternate would be p is not 80%. And thus, will be a two-tail test. Now let's reword the example we just saw. This time, a nutrition supplement claims that at least 80% of it's customers report improved sense of physical well-being. The newspaper reporter now is going to investigate this claim. In this case then, the null hypothesis will be p is equal or more than 0.80. And the alternate would be p < 80%, which will make this a one-tailed test. Now let's relook at this problem again. This time the nutrition supplement has claimed that 80%, as the percentage of proportion that claim an increased sense of wellbeing. The reporter thinks this is too high, and would like to study this. Now the reporter is putting out an alternate hypothesis, a theory that this proportion is less than 80%. Then the null has to be formed as will be p is equal or more than 0.8%. Through these three ways of thinking about the same example, I hope you see that thinking about the research question should guide you in developing the null and alternate hypothesis. All scientific studies started asking a question about a variable of interest. We start the testing by formulating the question in a form of hypothesis that gives us the null hypothesis and the alternative hypothesis. The null hypothesis is the statement that is believed to be true. The data collection and analysis is based on the null hypothesis. If the data collected contradicts the null hypothesis, then we will reject the null hypothesis. We will learn how to do this in the other lessons in this module. [SOUND]