In this video, we're going to make use of what we learned about the Central Limit Theorem for proportions in terms of their shape normality, as well as, their standard error to calculate confidence intervals for unknown population proportions. Two scientists want to know, if a certain drug is effective against high blood pressure. The first scientist wants to give the drug to a thousand people with high blood pressure, and see how many of them experience lower blood pressure levels. The second scientist wants to give the drug to 500 people with high blood pressure and not give the drug to another 500 people with high blood pressure, and see how many in both groups experience lower blood pressure levels. Which is the better way to test this drug? You've all been exposed to principles of experimental design by now and we know that controlling is important when we're running experiments. So, the answer should be that 500 get the drug and 500 don't where the group that doesn't get the drug acts as the control group, should be the better design. This question was posed to 670 Americans in, within the general social survey in 2010, and 99 of them said that all 1000 should get the drug. So, we're going to be categorizing these people as those with a bad intuition for experimental design. And 571 of them said that 500 should get the drug and 500 should not, and we're going to be labeling these people those with good intuition about experimental design. Our goal is to estimate what percent of Americans have good intuition about experimental design. Our parameter of interest here is the percentage of all Americans who have good intuition about experimental design, and we're going to denote this unknown population parameter, P, for population proportion. Our point estimate is the percentage of sampled Americans who have good intuition about experimental design. And we're going to denote this p-hat, and this is our known sample proportion. In fact that is 571 divided by 670. The total sample size roughly 85%. When it comes to estimation of an unknown population parameter, we know that it always follows the same structure, the point estimate plus or minus a margin of error. In this case our point estimate is our sample proportion p-hat. And our margin of error can be calculated as z star our critical value, times the standard error of p-hat. So, once again, the only new concept here is going to be how to calculate the standard error for the sample proportion. And in fact, we were already introduced to this when we talked about the central limit therum for proportions. So, to calculate the standard error for proportion, for calculating a confidence interval, we would use the formula that's based on the central limit theorem, square root of p-hat times one minus p-hat over n. Remember that when we initially introduced this formula we had used p instead of p-hat in the calculation of the standard error. Well, but we also said that if you don't know p, or true population parameter, you would be plugging in your sample proportion, and in fact, in most instances, we don't know what the true population parameter is. That's why we're calculating a confidence interval in the first place. Going back to the data that we had, the general social survey found that 571 out of 670, that's roughly 85% of Americans, answered the question on experiment design correctly. We are asked to estimate using a 95% confidence interval, the proportion of all Americans who have good intuition about experiment design. Before we can go ahead calculate the confidence interval, we need to make sure that the conditions for inference have been met. The first condition to remember is independence. And that relies on a random sample, and less than 10% of the population being sampled. So we have 670 Americans, which is definitely less than 10% of all Americans, and we know that the general social survey samples randomly. Therefore, we can assume that whether one American in the sample has good intuition about experimental design is independent of another. The second condition is about the sample size. And remember that we check this condition when dealing with categorical variables and proportions as the success-failure conditions. So we need to make sure that we have at least ten successes and ten failures in our sample. The sample size overall is large so we should be good here but let's take a look real quick. We have 571 successes and the remainder of that 670 minus 571, 99 failures. So we didn't have to even go through the n times p-hat route here, because we already know the number of successes and we know that both of these numbers are indeed greater than ten. Therefore, since the success-failure condition is met we can assume that the sampling distribution of the proportion is nearly normal. Now that we have all of our building blocks we can actually calculate our confidence interval. Remember that a confidence interval is calculated as the p-hat plus or minus a z star times the standard error of p-hat. In this case our p-hat was 0.85 for a 95% confidence interval z star is 0.96 And we calculate the standard error as the square root of 0.85 times 0.15 divided by 670. That yields a standard error of 0.0138, which multiplied by 1.96 yields a margin of error of roughly 0.027, roughly 2.7 percent. Therefore, the confidence interval overall is between 0.823 and 0.877. So, we can interpret this as, we are 95% confident that 82.3% to 87.7% of all Americans have good intuition about experimental design. Not too shabby. Next, let's talk about what if we wanted to adjust our margin of error. The margin of error for this previous confidence interval was 2.7%. If, for a new confidence interval based on a new sample, we wanted to reduce the margin of error to 1% while keeping the confidence level the same. At least how many respondents should we sample? We have given an example like this one we worked with means as well. Remember, we have a desired margin of error, we know our confidence level, and we know everything else associated with calculating the standard error, except our sample size. So, the margin of error that is desired is one percent and we know that this going to be equal to 1.96 times the square root of 0.85 times 0.15 divided by n. I'm using the p-hat from the previous study because I already know something about this population, so I might as well use it there. So, the only unknown in this equation is our sample size. In order to get the same size out of this equation and so forth, I can first square both sides, shuffling things around. [BLANK_AUDIO] Note that here I'm using the decimal point versions of 0.01 as opposed to 1, for 1% and this yields a sample size of 4898.04. However, remember that we need to round this number up even though mathematically, it doesn't make sense to round up. Because what this is saying is that in order to ensure a maximum 1% margin of error, we are going to need 4,898.04 persons. Since we can't have 0.04 of a person, we would say that we are going to need at least 400, 4,899 people in our sample. We can see that for a minor reduction in our margin of error, we're going to have to increase our sample size a lot. And remember that that's because the sample size appears under the square root sign in calculation of the margin of error. So, if you want to have benefits from an increased sample size, you're going to need to increase your sample size by a lot before you can actually start reaping the benefits. Let's make one more point about calculating the required sample size for the desired margin of error. Remember that this is the formula for the margin of error. It's a critical value times the standard error. If there is a previous study that we can rely on for the value of p-hat in this formula, we would use that in calculation of the required sample size. This is what we did previously, where we plugged in 0.85 for p-hat that came from the previous study, the general social survey. If not, then we're going to use 0.5 for our p-hat. There are two reasons why we do this. One, if you don't know any better and this is a categorical variable with two outcomes, a success and a failure, 50-50 is a pretty good guess. Two, using 0.5 or 50 for, percent for p-hat gives them most conservative estimate. In other words, the highest possible sample size. And we like being conservative when it comes to estimating minimum required sample sizes, because we definitely don't want to make a mistake and have to re due our sampling.