A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

135 ratings

Johns Hopkins University

135 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 3B: Sampling Variability and Confidence Intervals

The concepts from the previous module (3A) will be extended create 95% CIs for group comparison measures (mean differences, risk differences, etc..) based on the results from a single study.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So, welcome back. Here, we have the sequel part two for looking at confidence intervals for measuring of association for binary comparisons between two populations based on the sample results. And we are going to be looking at estimating confidence intervals for a ratios of proportions, often called relative risks. And odds ratios. These are the two ratio-based summaries we discussed when comparing binary outcomes between two groups.

So upon completion of this lecture section, you will be able to estimate 95% confidence intervals and other levels for relative risks and odds ratios by hand.

Explain the relationship between the null value, 0 regarding confidence intervals for the natural log of ratios. And then the null value 1 for the ratios themselves.

Explain how the confidence intervals for the difference in proportions, relative risk and odds ratio estimated form the same data sample. Should agree in terms of including or not including their respective null values which is zero for difference in proportions and one for the ratio, values.

So again we'll come back to a data set that we're quite familiar with, this one on response to therapy. Antiretroviral therapy stratified by baseline CD4 count at the start of therapy in our a random sample of a thousand patients from a city-wide population of HIV patients. And we've seen this several times now. The proportion in our sample of 503 who responded amongst those with CD4 counts of less than 250. Was about 25% compared to 16% responding to the sample of 497 persons with CD4 counts of greater than or equal to 250 at the time of therapy.

So again, we've shown how to summarize this with the difference of proportions, the risk difference. And we've also shown in the previous section, how to express our uncertainty in this estimate by estimating a standard error. And creating confidence leverage for the true underlying population risk difference which ranged in this example from 4% to 14%. So the second summary measure we've looked at using the same two numbers but comparing them in ratio form as opposed to subtraction is the relative risk. And if we were to actually look at the, ratio, risk ratio relative risk of these two proportions. The 25% who responded in the, lower CD4 count at baseline group compared to the 16%, who responded in the greater than equal to 250 group. And we got a relative risk of 1.56, or. 56% greater chance of responding, among those with lower CD4 counts at the start of therapy. But of course like our other measures, this is just an estimate based on these data from a sample. So it turns out as we've noted in the first part of this lecture set, in order to actually create a confidence interval for the relative risk we need to do our computations on the log scale. So to start, we can estimate the natural log of our observed relative risk of 1.56. And the natural log of 1.56 is 0.44. So 0.44 is the estimated log relative risk. Our response to those with the greater CD4 counts to the group with the lesser.

So, how are we going to create a confidence interval for this? Well, turns out, we can estimate the standard error just like we found every other measure from the sample of data we have.

And the relative standard error of the log relative to risk, the estimate looks like this. If we set our data up in a two by two format, table format, that's probably the easiest way to demonstrate this. It's 1 over the number responding in the first group being prepared minus 1 over the total number of persons in the first group be compared a plus c. Plus 1 over the number responding in the second group minus 1 over the total number of people in that second group. So you can see this actually these two pieces you have to do with the proportion responding in each group. At least, the values, involved, a over a plus c is the proportion to respond in group 1. b over b plus d is the proportion to respond in the second group. And you can see that these piecewise differences will always be greater than 0. Because 1 over a will always be larger than 1 over a plus c,

So the squared root is [UNKNOWN] [UNKNOWN] squared root of something positive. So let's do this for our data here, so let's estimate the standard error of log ratio here For these data.

So, we take the standard error in applying that formula I laid out before of this log relative risks of 0.44. We apply the formula that we had before. We take one over the number who responded in the group with starting CD4 count of less than 250. That was 1 over 127. Then we subtract 1 over the total number in this group. 1 over 503 and then we do the same computations for the group starting with CD4 counts are greater than or equal to 250. 1 over the 79 who had the outcome or responded, and then we subtract 1 over the 497 total in that group. And if you do the math on this and do it out on the calculator, we get a standard error for the log relative risk of about 0.13. So, now, to get the 95% confidence interval for the log relative risk. Remember estimated log relative risk [BLANK_AUDIO] IS 0.44. What we're going to do is take the estimated log relative risk. This is just business as usual, take our estimate, be it on a log scale, and then subtract 2 estimated standard errors of this log relative risk. [BLANK_AUDIO] So if we do this we get .44 plus or minus 2 times .13. And this gives an interval if you do the math of .18, .7. So that looks all good and well but this is a 95% confidence interval for the log or relative risk. And we wouldn't want to report that because people don't think on the log relative risk scale. So it 95% confidence interval for the relative risk we're going to exponentiate / g or anti-log those end points. [BLANK_AUDIO] And if we do that, then we get a confidence interval for the relative risk that goes from 1.20, 2.01. So a fair amount of variation, potential variation in the true relative risk, a respond in the population. But you notice though that all the values are greater than one.

Indicating that even after counting for sampling variability, there's evidence of a real positive association between greater CD4 count in response to therapy.

So how can we interpret these results substantively? Well we could say something like this. Based on the results of this study, HIV positive individuals with CD4 counts of 200, less than 250, at the time of starting therapy have a 56% greater risk or probability of responding the therapy. It does say strange, sound strange, perhaps, semantically, to say greater risk of a good outcome. Response, but it's technically correct in the language of statistics. They have a 56% greater risk of responding to the therapy when compared to HIV positive individuals with CD4 counts of 250 or more at the start of therapy. Additionally, these results estimate that this increase in response probability, risk, could be as small as 20% and is large as 101%. So, there's a fair amount of uncertainty, but all results, all possibilities result to increase response at the population level for those with lesser CD4 counts when starting therapy. Okay, let's go into looking at odds ratio. This is pretty forward as well. It's pretty similar to what we do with relative risk. In fact, the formula for standard error is even easier to sort of work with or if you were trying to memorize these which I don't recommend. This would be easier to memorize. So, the odds ratio in these data was slightly large and magnitude in their relative risk was 1.75. On the log scale that's equal to 0.56. Now, we're going to estimate the standard error of the log odds ratio. Again, we're doing our computations on the

So, now if we apply this to the same data set and do the standard error computations Using the same information we used, essentially, for the standard error of the log relative risk, when we do it for the standard error of the log odds ratio. We pulled those numbers from the two by two table. This is one of the most easy to remember formulas, not that I suggest you memorize it. But just have four cells in the CD, in the two by two table, and what we do is. Take the square root of 1 over the first cell count, plus 1 over the second cell count, plus 1 over the third cell count, plus 1 over the fourth cell count. And we get a standard error here of 0.16. So if we

do this confidence interval for the log odds ratio. We take our estimated log odds ratio 0.56 and subtract two estimated standard errors.

that goes from 0.24 to 0.88. So notice, this is something I should have pointed out with the relative risk as well. Notice, all possibilities for the log odds ratio are positive. There's no 0 in here. 0 on a log odds ratio scale would indicate no difference in the odds between the two groups. And so when we exponentiate or antilog, these end points, because there was no 0 in the

range of possibilities for the true log odds ratio. There's no 1 in the range of possibilities for the odds ratio itself. And so we get a odds ratio estimate of 1.75 but with a confidence interval 1.27 to 2.41. So, a far amount of uncertainty associated with this measure of association as well.

Well, if it sounds similar to what we do with the relative risk. But we'll have to talk about in terms of the comparison of odds, which is not a direct comparison of risk. So, we'd say, based on the results of this study. HIV positive individuals with CD4 counts of less than 250 at the time of starting therapy have 75% greater odds responding to therapy when compared to HIV positive individuals with CD4 counts of 250 or greater at the start of therapy. Additionally these results estimate that this increase in response odds could be as small as 27% and as large as 141%. So, the same message just on a slightly different scale than we got with the relative risk. You might say, well if we have the risk difference and the relative risk in the respective confidence intervals. Why would we even bother with the odds ratios. And you've raised a good question and just wanted to bring it in. And show you how this is done. And how it coroborates with the other results.

And how similar the standard error computation, and the idea of doing things in [UNKNOWN] as with the relative risk. But many situations where we can estimate the risk difference in relative risk won't necessarily include information about the odds ratio, as its less intuitive comparison measure. However, we talked about the idea there is some [UNKNOWN] of study where we can only estimate the odds ratio, is the only valid estimate of association. And also in the realm of logistic regression, results come on the log odds ratio scale at first pass. And some of the computations to do confidence intervals and such are similar.

So let's just compare and contrast all three estimates and CIs. So if we looked at the difference in proportions, the risk difference what we saw in these data was a 95% confidence interval from 0.04. [BLANK_AUDIO] To 0.14. So all possibility showed a greater proportion, is to remember this is comparing the group with higher CD4 counts to lower. So the greater proportion of responders. If we looked at the relative risk, the confidence interval goes from 1.20 to 2.01 and in the odds ratio, it goes from 1.27 to 2.41. So, we have already seen that the estimates all three in terms of interaction of association. And this holds up after accounting for the uncertainty in the, our, our estimates. All possibilities or in all three scales show only positive associations. So notice the difference in proportions the confidence interval does not include the null value of zero for relative risks. And odds ratios these do include the null value of 1. So even though these are all on different scales and have different interpretations. They will agree in terms of the overall direction of association, which we already established. And their confidence interval, we'll agree with regards to their respective null values. There is no zero in the difference in proportional interval, which means there is no one in the ratio intervals.

Lets look at our AZT maternal/infant transmission data and do some inference In confidence interval creation with that. So these are data we're very familiar with by now.

And we've seen that, the estimated difference in proportions in terms of children who contract HIV within 18 months of birth. The efforts made a difference in proportions of [UNKNOWN] infants born to mothers who were on AZT versus not was negative 0.15 or negative 15%, an absolute difference of 15%. And the confidence interval, the confidence interval on this went from negative 22% to negative 8%. [BLANK_AUDIO] So this was, we, again say we've ruled out no association as a possibility. There's a clear association even after accounting for the uncertainty between fewer transmissions and whether the mother was treated with AZT during pregnancy. Okay, so let's look at the ratio proportions here. It was 0.32. This is the relative risk. We take that 7% divided by that 22%. And we've said how to interpret this was the risk of mother/child HIV transmission for mothers given AZT is 0.32 times that risk of mother/child transmission for mothers given placebo. For, in order to make it clear that this reduction, we said this indicates the 68% lower relative risk of mother/child HIV transmission for mothers given AZT.

So let's look at putting a confidence level on this. Again, we're going to have to go to log scale. So the log, the natural log, of 0.32 is negative. Because the relative risk is less than 1, its logarithm will be negative. And we use the same formulae to get the standard error of the log relative risk. We'll use it with these data. So if you plug in these values, the estimated standard error of the log relative risk from these data is 0.30. So if we

just do our standard thing, take our standard based estimate and add and subtract two standard errors. We get a confidence interval for the log relative risk. It goes from negative 1.74 to negative 0.54. Now, it's hard to interpret this confidence interval because it's, for a log relative risk, but notice that all possibilities are negative. There's no 0 in this interval. So when we exponentiate our results to get the confidence interval for the relative risk we will not get one in the interval. And so all, possibilities in the confidence interval for the relative risk show a reduced,

percentage of outcomes, proportion of outcomes for those children whose mothers were given AZ. [INAUDIBLE] relative risk compares to the risk of transmission among children whose mothers were given AZT during their pregnancy. Compared to their risk of transmission for children whose mothers were given the placebo.

So how do we interpret the entire thing here, incorporating the uncertainty statement with that interval. We could say an HIV positive pregnant woman could reduce her personal risk of giving birth to an HIV positive by nearly 70%, well 68% reduction if she takes AZT during her pregnancy. A study result suggests that this reduction in risk could be in air quotes, as small as 42% and as large as 82%. So, given the worst case scenario, there's still evi there could be a pretty sizable reduction in a woman's personal risk of giving birth to a child who contracts AZT, HIV. And

if we did the odds ratio, it's more business as usual. Just mechanically we've got the odds ratio estimate. We take it's log at negative 1.31 and we're just going to go through our machination here and just do or crank out estimated standard error. As if we were human computers.

And the estimated standard error, if you do this, using those four cell counts is about 0.34. This is the estimated standard error for the log odds ratio.

So, then, putting this all together, if we take our estimated log odds ratio of negative 1.31. And subtract two estimated standard errors, we get a confidence interval for the log odds ratio that goes from negative 1.99 to negative 0.63. Notice that interval does not include zero all possibilities for the association on the log odds ratio scale are negative. When we exponentiate the end points, we get a confidence interval, the odds ratio that goes from 0.14 to 0.53. So again Even after counting for the uncertainty in our estimates. All possibilities favor the group whose mothers were given AZT in terms of having reduced risk. This time measured by the comparison of odds reduce risk of contracting HIV within 18 months of birth.

So, AZT is associated with an estimated 72% reduction in odds of giving birth to an HIV infected child among HIV infected pregnant women. Study results suggest that this reduction in odds could be as small, air quotes again, as 47% [BLANK_AUDIO] And as large as 86%. And that just adds in the sample in variability there. Explains the confidence interval in a substantive context. So let's just line these all three up here. Now I'll put the, the estimates and the confidence intervals for the difference in proportions of relative risk. In the odds ratio, the estimated difference in proportions was negative 0.15 for the ACT group compared to the placebo group. But, from negative 0.22, negative 0.08, relative risk 0.18 to 0.58. 0.32, that's the estimate.

And then the odds ratio, 0.28, 0.14, 2.53 for the confidence interval. So again, we get agreement in terms of the directional association and whether or not the null value is a possibility through all three of these estimates and confidence intervals. So there's no zero in the confidence interval for the risk difference, and no one in the confidence intervals for the ratio based estimate.

Let's look at our aspirin and CVD women's study in this context and compute the resulting confldence intervals for these ratio measures of association.

I'm just going to actually show you what would happen if we computed all these now that we've had some examples how to do it by hand. And again doing it by hand is not the most important thing in real life, you'll be getting it from computers. But I want to give you I want to remind you that these are done on a log scale. And the uncertainty is the function of the counts in the resulting two by two table we create to represent the sample data.

So the association was small on the risk difference scale. And again, that's because the proportion of women interacting CBD in both the groups compared the aspirin and non-aspirin were small. But if you look at the confidence interval for the risk difference, it includes 0. So this is And we'll formally define in the next set of lectures. This is what we'd call a non statistically significant result.

There was no solid conclusion that the difference after accounting for unsampling invert variability was in one consistent direction.

So if we redo the confidence interval for the relative risk, the estimate was 0.92, which showed a smaller proportion of outcomes than the aspirin group in the study. But when we accounted for the uncertainty in our estimates and do the confidence interval, notice that this confidence interval includes 1. As it should, given that the confidence interval for the risk difference. Our difference in proportions includes 0. These, these are both based on the same information, just representing them in different ways. So we should get the same direction of association in our estimates and the same overall conclusions about whether. After counting for sampling variability, there's a clear reduction or increase in the risk grasp and compared placebo. And we, we don't have that clear distinction in either group. In either measure we have 0 in this confidence interval, and 1 in this confidence interval. And then following on the heels of that, the odds ratio we also get a confidence interval. And it includes one. You can see here because the outcome of coronary heart disease, cardiovascular disease was relatively rare in both the groups being compared, those who received aspirin, those who didn't. The relative risk and odds ratio estimates and their confidence intervals are identical.

And you can see that they actually noted this confidence interval in the result section for the relative risk. They said the relative risk estimate was point slightly different. We got 0.92 and they, I rounded a little more crudely than they did, but their 95% confidence interval goes from .8 to 1.03 which is With same thing I showed in a previous slide. So, they are reporting this uncertainty on the relative risk

in their result section. And furthermore in the study, they allow us to compute the risk difference as well. So, we get an understanding of their general magnitude of cardiovascular events in this population that was randomized to receive or placebo.

Finally, in the Hormone Replacement Therapy and Risk of CHD study that we've been looking at. We also saw relatively small proportion of outcomes in both the groups we're comparing. And this was developing cardio, coronary heart disease within the follow-up period for those who received hormone replacement therapy and those who got a placebo.

risk difference scale is small. But that's primarily because the base log risk in the group that had the placebo that wasn't treated was small to begin with. But it showed even after counting for the uncertainty in their estimate, it showed a positive association with hormone replacement therapy on both end points of the risk difference. 0 is not in this interval.

So the researchers concluded that there was an increased risk of cor, coronary heart disease for those women who were given hormone replacement therapy. And that's one of the reasons the trial was stopped early. If we did this on the relative risk scale, the estimated relative risk comparing the hormone replacement therapy to the placebo group on coronary heart disease was 1.27. But you can see the, while the 95% confidence interval includes only values that show a positive association at the population level between hormone replacement therapy

and coronary heart disease. The range here goes from almost no increase only 1% up to 60% increase. So there's a fair amount of uncertainty in this estimate. But, the researchers felt that with the estimate, itself. The 27% increased couple with the fact that the result was statistically significant. In the sense that all possibilities showed a true positive association with hormone replacement therapy and risk of cor, coronary heart disease. They thought that, that was enough to call it risky. And then on the odds ratio, the results are almost identical to what we got in the relative risk. Because again the outcome of coronary heart disease was statistically relatively small in both groups being compared. So what are we saying here? The confidence intervals for ratio based measures of association with binary outcomes. Both the relative risk and odds ratio need to be computed on the natural log scale.

And then the results exponentiated or anti-logged back to the ratio scale. However the computations on the log scale are business as usual. We take our best estimate based on our samples of data. And that is the tract two estimated standard errors estimated from our samples of data. And the resulting estimates of the difference in proportions relative risk in odds ratio is based on the same data. We'll agree as we have seen before in terms of the direction of association. And now we show that the resulting confidence interval will all agree with inclusion exclusion of the null value. The value that means no difference or no association with regards to the measure.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.