A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

135 ratings

Johns Hopkins University

135 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 3A: Sampling Variability and Confidence Intervals

Understanding sampling variability is the key to defining the uncertainty in any given sample/samples based estimate from a single study. In this module, sampling variability is explicitly defined and explored through simulations. The resulting patterns from these simulations will give rise to a mathematical results that is the underpinning of all statistical interval estimation and inference: the central limit theorem. This result will used to create 95% confidence intervals for population means, proportions and rates from the results of a single random sample.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So this next set of lectures we will use the results from the Central Limit Theorem regarding the theoretical sampling distribution and our ability to estimate characteristics of this sampling distribution from a single data sample to create an interval that incorporates the uncertainty in our sample based estimate as it estimates some, the underlying, unknown, true population value. Sometimes called a parameter, like a population mean, or proportion. And theses intervals that we'll be creating are called confidence intervals.

Okay, in this section, we'll begin our quest to estimate confidence intervals for single population parameters. From single samples from the aforementioned population by focusing on confidence intervals for a population mean.

So hopefully upon completion of this lecture section, you'll be able to explain how the Central Limit Theorem, aka the CLT, sets the groundwork for computing a confidence interval for some unknown population parameter. Be it a mean, proportion or incidence rate, using the results from a single sample of data from the population. And then how to estimate a 95% confidence interval for a population mean based on the results of the single sample from the population. And how to estimate other levels of confidence, other confidence intervals, 99%, 90% for example for a population mean based on the results of a single sample from the population.

So let's just revisit the CLT, and it can never hurt to go over this a couple more times, it's such a powerful result.

Recall the CLT states that if all possible random samples of the same size, we'll call it N, were taken from the same population, and a summary statistic were computed, either be it a mean, or proportion, or incidence rate, whatever was appropriate for the type of data we had. If the summary statistic were computed for each sample and then the distribution of the summary statistic values across these samples was plotted, let's say a histogram of all estimates across all samples of the same size. Recall that this distribution, this histogram would approximate a normal distribution. This is my approximation by hand here.

That it would be well-described by a normal distribution, put a curve on top. The center would be at whatever the truth was, the true mean proportion or incidence rate. So

sorry for that curved line, but it, but it gives a little bit of spontaneity to this presentation, so we know that on a normal curve, most of the values that fall under normal distribution fall within plus or minus two standard units. We'll call them standard errors here to indicate that we're talking about uncertainty or variability in sample estimates across multiple random samples. So since most of them would fall above, within plus few standard errors or below, minus two standard errors, of the truth.

So most of the samples we could get just by chance from a population of interest. We'll have an estimate that falls within plus or minus two standard errors of this unknown truth. Only about 5% of the samples we could get in total, would have an estimate that falls farther than plus or minus two standard errors of the truth. So how that does help us though? Because in research, we're only going to take one sample from each population under study. So how can this Central Limit Theorem help us in research. Well, let's think about this for a minute.

We're only going to get one chance to estimate this unknown truth, and we're only going to take one sample.

But most of the samples we could take, not all, but most, about 95% of them, our estimate will fall within two standard errors of this truth either above or below. Now, we're going to take one sample. We could just by chance get an estimate that falls way outside the pack, something close to the middle, even closer to the middle, something within two standard errors but relatively far away from the truth, etc. We're never going to know where our estimate lays under this theoretical curve because we can't view or really draw this theoretical curve perfectly because we don't know the truth. But for 95% of the samples we could take,

will include our unknown truths. So we can come up with a range of possibilities for the unknown truth. Starting with our best guess, our sample based estimate, and adding in or factoring in the uncertainty in our estimate, brought about by the fact that we only have a imperfect sub sample of the population who we want to quantify. Now the rub with this that we saw in the last section was, we don't know the true standard error. However, we solve in detail that this can be estimated, based on the results of a single sample. So we're pretty much ready to go ahead and build these intervals based on the results of single samples. The kind of things we've been looking at in the previous lectures.

So here's one of our favorite data sets that we'll work with again this cl, systolic blood pressure measurements from a random sample of 113 men, adult men taken from a clinical population. So the sample mean, our best guessed, estimate for the, true population mean blood pressure for this population from which the 113 men were sampled, is 123.6. And that's our best working guess for the truth, or estimate for the truth. But we know it's not necessarily a perfect estimate, so we want to bring in the uncertainty. Well, let's create this idea of a 95% confidence interval here. We take our sample mean, plus or minus two estimated standard errors.

So we know, from previous lectures, we can estimate the standard error of a sample mean as a function of the sample standard deviation that's 12.9, which estimates the true variability in all such men, based on estimating it from the 113 we have in our sample divided by the square root of 113, our sample size.

So, given the results from this sample we can estimate a 95% confidence interval for the true mean blood pressure for all men by taking our sample mean, adding and subtracting two estimated standard errors.

So here we're at a 113. And I'll let you verify the math on this, but if you do this you take the mean and add and subtract, you get an interval that goes from 121.18 millimeters of mercury, and

we could easily end properly, there'd be no reason not to round that to 121.2 but, and then 126.02 millimeters of mercury.

So, we'll delve into the interpretation of this in more detail throughout the rest of this set of lectures here, in lecture seven. But, this interval gives a range of possibilities for the unknown true mean blood pressure, for all men in this population.

Is the true mean necessarily in this interval? Not necessarily. For 95% of the samples we got this interval should include the truth but we'll never know whether we got one of the 95% samples that were in, within that two standard error range or not. And we'll discuss this in detail in the subsequent section. But for now this quantifies an estimated range of possibilities for the true mean that we can't directly observe. Let's look at another example that we've dealt with extensively so far in the course. Length of stay data from the Heritage Health claims. These are the length of stay data for patients with at least one day in 2011. This is each patient's cumulative length of stay for the entire year for as many times as they were admitted to the hospital. Okay and we remember that this was heavily right skew data where the mean was 4.3 days, and the standard deviation, the variation in the individual sample values was 4.9 days. So again, we'll use the same approach to get a 95% confidence interval. We take our sample mean estimate and add and subtract two estimated standard errors of the sample mean.

plus or minus 2, and our estimated standard error will be that standard deviation of 4.9. Individual variation estimated by the 12,298 claims in our sample, divided by the square root of that sample size. And again I'll let you do the math and verify what I've said, but this confidence interval goes from 4.21 days, to 4.39 days. Its rather tight and not, and narrower than our previous interval because the sample size here was so much larger.

So this interval gives us a range of possibilities for the true mean length of stay of between 4.21 days and 4.39 days.

Let's look at another example. You might think, well, what good is the, what is the utility of the single confidence interval for a single parameter. And to start, it helps us quantify something about an unknown quantity we want to get a hold on. So it helps us give an interval to understand what's going on. But this becomes especially interesting perhaps and useful when we start comparing populations using some summary statistic as our quantification of, of the distribution of values in that population. So let me give you an example. Here's a study from the New England Journal of Medicine in, intended to look at the claims that low carbohydrate diets were associated with greater weight loss. That would have been [INAUDIBLE] and anecdotally in through books, and people were jumping on things like Atkins Diet etc, because of anecdotal evidence, and this is one of the first studies to take this head-on. So the study entitled, Low Carbohydrate as Compared with a Low Fat Diet in Severe Obesity. So what the researches did is they took patients that were clinically obese, severely obese, and randomized them to one of two diet groups. Either a low fat

or a low carb and the subjects were followed for a six month period and for each subject in each group, they looked at the, after the six months on the diet minus before when they started the diet the weight changed. For each person in each group they computed the change in weight. So some people extensively lost weight. Maybe some people gained weight. And then what they did, what they did at the end of the study is they quantified the average weight change in the two diet groups. So for the 64 people who were randomized in the low-carb group, they lost on average 5.7 kilograms, but there was a fair amount of variation in these individual weight changes, and so the standard deviation of the 64 individual changes in weight was 8.6 kilograms. They did the same thing for the low fat group.

There were 68 patients their subjects randomized to the low-fat group, and the average weight change was a decrease of 1.8 kilograms. So on average, the patients lost weight in this group as well. There was a little less variability in these individual weight changes than there were in the low-carb group.

So we might be thinking what can we conclude about the efficacy, if you will, of low carb versus low fat diets within, with regards to weight change. Well, our sample results suggest that those in the low carb diet lost more on average, on the order of almost four kilograms, more on average. But that these estimates are based on 64 and 68 subjects,

respectively in areas especially on the low carb group, a fair amount of individual variation in the weight change metrics. So in order to better examine this and make our conclusions, we first want to bring in the uncertainty in our estimated mean weight changes in the two groups, before making a conclusion about whether one group had better weight loss than the other. So to start, what we could do is perhaps make confidence intervals for the weight change in each of the two groups, the average weight change. So if we do that for the low carb group, if we follow our formula for the 95% confidence interval we take the observed mean of negative 5.7 kilograms and subtract two estimated standard errors, which again is just our estimated standard deviation of the individual weight changes divided by the square root of the sample size. And if you do out the math we see that the 95% confidence interval for the average weight change for those in low carb group was between negative 7.8 kilograms and negative 3.5 kilograms. Well let's think about this for a minute. After accounting for the uncertainty in our estimate, our best guess or estimate for the weight change is negative 5.7 kilograms on average. But there's some potential uncertainty and so, in 95, 95% confidence interval incorporates that uncertainty to give us a true mean, or range for the true mean weight change were we to give all severely obese diet, patients this diet. And so anywhere from negative 7.8 kilograms on average to negative 3.5. Notice that all values in this interval are negative, suggesting that after accounting, even after accounting for the uncertainty in their estimate, there's some evidence of a real weight loss on average, with all possibilities for the true change are negative. Let's do the same thing for the low fat group. We create a confidence interval for the low fat group by following the same formula approach. We take the negative 1.8 plus or minus 2 estimated standard errors, which again if you go back to that table, is the standard deviation of the 68 individual weight change measurements in the low fat group, divided by the square root of the 68 subjects in the low fat group. And when all the dust settles, the confidence interval goes from negative 2.7 kilograms to negative .9 kilograms. So, as with the low carb group, there’s some evidence that the weight change on average was negative even after accounting for the uncertainty or estimates. All the possibilities in this confidence interval for the true mean change are negative.

So, it looks like on the whole, there’s evidence that both groups lost weight, and this weight loss is real when we account for the uncertainty error estimates on average. However, if you look carefully at this, and we're going to get into comparing populations head on, in the next set of lectures, but this is the start, you'll notice that the confidence intervals for these two groups if you were to plot them on a number line. The confidence interval

for the two groups do not overlap. So what do I mean by that? Here's zero, so if we put the end points for the low carb group, it would be something like, and this is not drawn to scale, negative 3.5, negative 7.8. So this is the confidence interval for the

low carb group. Do the same thing for the low fat group. We get something that goes from about negative 2.7 to negative 0.9. So both groups there's evidence of a mean weight loss overall, but the amount of loss, on average, is greater in the low carb group, even after we've accounted for the uncertainty in these estimates. So what we'll get to in the next section is we'll call this difference statistically significant, meaning that even after we've accounted for the role of chance in our estimates, there's clear distinction between the confidence intervals between the two groups.

Let's look at another example from the literature. The Effects of Lower Targets for Blood Pressure and LDL Cholesterol on Arteriosclerosis in Diabetes: The SANDS Randomized Trial. And this is from the Journal of the American Medical Association in 2008. And the objective of this study was to compare the progression of subclinical arteriosclerosis in adults with type two diabetes treated to reach either aggressive targets of low density lipoprotein cholesterol level of 70 milligrams or lower and a systolic blood pressure of 115 milligrams of mercury or lower. So that's one group, they were given these aggressive targets. The other group was given the standard targets. They were randomized and given the standard targets of reaching low density pro, lipoprotein cholesterol levels of 100 milligrams per deciliter or lower and systolic blood pressure of 130 milliliters of mercury or lower. And so the researchers were interested to see if these giving more aggressive targets would actually end up resulting in better outcomes on these measures for this group with the type two diabetes.

So, this was a randomized, open labeled, blinded to endpoint, three year trial from April 2003 to Ju-, July 2007, at four clinical centers in Oklahoma, Arizona, and South Dakota. And the participants were 499 American Indian men and women, age 40 years or older with type two diabetes and no, no prior cardiovascular events. So the population under study here is American men, Indian men and women, age 40 years or older with type two diabetes. And what we have is a sample of 499 to work with. And so the intervention is that participants were randomized to these aggressive targets or standardized targets with stepped treatment algorithms defined for both. So what were the results of this? I'll just pull a section from the results section here with some quotes, so it's pulled directly from the paper.

It was referenced before. So the mean target LDL cholesterol level and systolic blood pressures for both groups were reached and maintained, so both of the groups met their targets. But let's look at what they found. The mean and with 95% confidence interval levels for LDL cholesterol level in the 12, last 12 months were 72.

So, this is for LDL. And, then for the blood pressures, for the aggressive group, the mean blood pressure at the end of the study was 117 with a confidence interval of 115 to 118 and for the standard group was 129 millimeters of mercury, with a confidence interval of 120 to 130. So what they're showing here is that while, that while both groups met their targets on average, these averages were lower for the aggressive group on both outcomes, LDL and s, systolic blood pressure. And even after counting for the uncertainty,

there's a difference in these averages because the confidence intervals do not overlap. And again we'll get to quantifying the difference between two populations and really focusing on that measure. But to start, this shows that the efforts for the effects of the aggressive targeting resulted in lower average LDL and SBP measures than compared to the standard group, and this difference we solved couldn't be explained by

random sampling error alone. That these even after accounting for uncertainty, there were clear distinctions between the mean results for both groups.

And if you actually look at this paper you can see that they present a lot of 95% confidence intervals for a lot of different outcome measures, at different times between these two groups. And they ultimately quantify these in terms of differences in means between the two groups at the end of the study. And that's where we're going, and that's what we'll get to in the next set of lectures.

So one thing to think about. So I keep talking about 95% confidence intervals, and these are what we might call the industry standard in research. This is what's generally presented in journal articles and what's expected from researchers. It is certainly possible, however, to estimate intervals with different levels of confidence. And for the same logic that we discussed before, just changing the width of estimates that fall, how far they fall from the truth. If we only wanted a 90% confidence interval for a population mean, we could start with our estimate, but only add and subtract 1.65 estimated standard errors. If we wanted a 99% confidence interval, we'd actually need to go additional 0.5 standard errors in either direction above and beyond what we'd need for the 95%. So you can see there is sort of a law of diminishing returns here because of the bell shape of the normal curve in order to get four percent more confidence if you will, increase our chances that any single sample yields an interval that includes the truth. We need to, to increase that by four percent. We need to add a total or increase our width of the interval

by .58 standard errors on either side. So, a hefty price to pay in some sense for that extra four percent confidence. So what we've seen in this section thus far is we first reiterated the logic of the Central Limit Theorem and how it helps us when applying to the results of a single sample. We found based on the logic from the Central Limit Theorem, most of the sample estimates we get for some unknown measure that quantifies something about a population, be it a mean, proportion or incidence rate will fall within two standard errors of the unknown truth. So conversely, for most of the samples we could take if we create an interval where we take our estimate and add and subtract two standard errors, upper estimate, the interval for 95% of the time. This interval will include the unknown truth. And we could change the level of confidence if we wanted by adding more or less than two standard errors.

The other thing that Central Limit Theorem gave us is well, this is all good and well, but we need to know the standard error. The true standard error is something we can't have. It's a theoretical population level estimate. Or it's based on population level characteristics but we can estimate this

So for a sample mean, for example, we can take our estimate as the sample mean. We can add and subtract two estimated standard errors of the sample mean.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.