A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

180 ratings

Johns Hopkins University

180 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings and welcome back.

In this section, we'll look at computing the sample

size necessary to achieve a desired level of precision,

a desired margin of error for single population quantities

like a population mean or a population proportion, for example.

So upon completion of this lecture section, you will be able

to create a table relating sample size to precision for an estimate

of a single population quantity, and solve for the necessary sample

size to get a desired level of precision, i.e.

margin of

error. So the idea of this is

in order to justify a funding request for a larger study, a researcher needs

to both demonstrate that the study allows for the

estimation of outcomes with a good margin of error,

and that the study can be performed given the

requested budget, that the sample size request is reasonable.

Designing a study, such the results have a certain margin of error requires some

speculation in advance about what the study

results will be before the study is done.

So this is

a sticky part about doing such computations, as you have to have an

educated guess going in to the process of what your study results will be before

you've done the study. So, where can this information come from?

Well, sometimes pilot studies, like some examples we gave in part

A, are done. Which are low to no budget studies done on

a restricted number of participants to get some data on the table in order to

design a larger study. Can also come from other research.

Researchers who studied a similar population for different purposes but

may have estimates of some qualities that you would need.

Or, in the case where nobody's done any

research related to what you're looking at, educated guesswork.

And that's, that's hard to do, but sometimes it's the best that can be done.

So let me give you an example using the

results from a pilot study to design a bigger study.

So recall the length of stay study with 30 subjects.

Suppose we were actually

recruiting the subjects.

And following them up and length of stay was the big outcome of interest.

But we actually needed to do more than solicit

their patient record, so that this study could be costly.

Well, recall when we actually looked at the

pilot results, the researcher had studied 30 persons.

And the average length of stay in the sample of 30 was 6.3.

There was a fair amount of

variability amongst the 30 length-of-stay measurements.

And so the margin of error for that study was 2 times

7.5 days, the standard deviation over the square root of the sample size.

So this is just the standard error in parentheses here.

That turned out to be 2 forms 1.4 days or 2.8 days.

But suppose we use this pilot study as a starting point.

Said well, in order

to estimate the margin of error for a given sample size we need

an estimate of the standard deviation of

individual values in the population we're studying.

The working one we have.

And it may not be a great one, but we'll talk about that

in a minute, is the 7.5 days from this study on 30 people.

And so using that we can write the margin

of error for studies dealing with length of stay and

the population we're sampling from as a function of sample size like this.

For a given sample size our estimated margin of error would be 2 times the

estimated standard deviation of 7.5 over the square root of the sample size.

So for example, if we were looking at, based on these results,

an estimated margin of error for a study with n equals 100.

The estimated margin of error would

be 2 times 7.5

over the square root of

100, which equals 2 times 0.75.

So we would estimate, be able to estimate the mean length of

stay within a margin of error of plus or minus 1.5 days.

If we thought that was a little wide, wanted to be more precise in our estimate,

we could see what would happen if we looked at a study with 250 people.

What would happen to our margin of error?

We plug in, instead of 100, we put 250 in the denominator.

We get a margin of error, when all the dust settles, and you can check my math,

of plus or minus 0.95 days, so almost one day.

Plus or minus one day would be how we get our confidence interval.

Taking our mean estimate and adding or

subtracting plus or minus 0.95 days to that.

Following that type of logic, and you can use

a spreadsheet program or something like that, you can easily

make a table like this, where you actually look

at the expected margin of error for different sample sizes.

But then, given that, you know, our estimate of standard deviation is

based on a small study to begin with, there's some uncertainty in that.

So it's us-, it's usual practice to actually look at some other

possibilities both above and below the estimate, just to get a sense of

the, what the possibilities for margins of error are with combinations of sample

size and allowing for the uncertainty in our estimate of the standard deviation.

So we might

produce a table like this, and then what we could say to our funding agency is,

suppose we desired, or we desired to get a margin of error of one day or less.

We could say, well, if you give us the funding to recruit 300

patients, we're pretty much good under all anticipated standard deviation scenarios.

This is a little above one day, but it's

very close. So 300 will cover all the bases.

But if you're not willing to pay for 300 subjects, if

you pay for 200, or somewhere between 200 and 300, for

at least two of the standard deviation scenario, we'll be okay.

However, if you cut our budget such that we can only sample 100 people, we're going

to be way off the mark.

You could also, if you were designing a

study and really wanted a point single estimate of

the sample size to get a desired margin of

error, you could solve for it relatively easy, algebraically.

If you recall, you know our estimated margin

of error was a function of our sample

size, is two times our sample standard deviation,

or the square root of the sample size.

And we could solve for n that would give us a margin

of error of 0.5 days. So for our data, it'd be

2 times the estimated 7.5 days standard deviation,

divided by the square root of n, equal to 0.5.

Do a little algebra to solve that.

With a cross product. So then we get 2

times 7.5 equals 0.5 times the square root of n.

Divide both sides by 0.5.

And we get the square root of n and just rewriting it in the opposite order here

but equals 2 times 7.5 over 0.5.

That's actually, when you do the math, that's 30.

So we get the square root of n.

Our desired, our necessary sample size is 30.

We square both sides, we get N equals 30 squared, or 900 people.

So we need

900 people, an estimated 900 people, to get

in an estimated margin of error of 0.5 days.

Let's look at another example.

Recall the pilot study example from section A on

30 participants given a drug and follow to see who

experienced a minor reaction and nine subjects had the

reaction, so, in that study our estimated proportion was 30%.

Our margin of error was 2 times the

estimated standard error based on 30, which is the

proportion, who had the reaction times the proportion who didn't.

Over the square root of the sample size, should have stuck a 30 in there, sorry.

0.3 times 0.7 over 30.

But, this actually is fortuitous, because this

is what I was going to write down here.

If we assumed our starting guess for the proportion, or

expected proportion of people who have the reaction in

the population as a whole is 30%, then a

margin of error for other studies from the same

population for different sample sizes could look like this.

So, for example, if we wanted to estimate, based on these pilot results, the margin

of error for a study with 150 patients, it would be two times the

square root of 0.3 times 0.7, over 150 which equals 0.075.

So the margin

of error here is plus or minus 7.5%.

We increase the sample size to 300, two times

square root of 0.3 times 0.7, over 300. If

we do the math on that, that gives

us the margin of error of 5%.

What, so our confidence interval would be created by taking the resulting

estimate from the study based on 300 and then subtracting 5%.

So just like we did with the previous example, we could

easily make a table that explored the margin of error,

both as a function of sample size, and the expected

proportion with the outcome.

And again, since that 30% was only based on 30 persons there's clearly

a lot of uncertainty we stated before the confidence interval was very wide.

We wouldn't necessarily use all values in the confidence interval, but we would

allow for a little bit of uncertainty, at least, in doing these computations.

And we could then look at the trade-offs

between margin of error and the expected proportion

in such a table.

We could also easily solve again for the

sample size to get a desired margin of error.

So for example, so suppose we want to be able to estimate

the me, the proportion, sorry, a victim of my own cutting and pasting.

Proportion with the reaction within

plus or minus 2.5%.

Well, we could set up an equation just like we did before.

Our margin of error, by the function of sample size using the estimated

proportion from the sample of size 30 looks like this, and we want this

to equal 0.025. So the first thing we might do

is divide both sides by two. Then we get square

root of 0.3 times 0.7 over n equals

0.025 over 2 which is 0.0125.

Square both sides to actually get rid of that square root,

so we get 0.3 times 0.7 over n equals 0.0125.

Do some cross multiplication, and this'll

bring us up here, and 0.0125 times

n equals 0.3 times 0.7, sorry

I meant actually squared. I need to fix this here.

When we square both sides, that should be squared 0.0125.

So this is

0.0125 squared times n, and

then n equals 0.3 times 0.7

over 0.0125 squared. And when

you do this, you actually get the necessary sample size.

If you do this and round up, based on the result, you get a necessary

sample size of over 1,300 to get a margin of error plus or minus 2.5%.

Need a fair amount of subjects.

You've probably heard reference to margin of error frequently in the

media when they talked about the results of a poll, and

they say something like, this poll was conducted with a margin

of error of plus or minus 3% or plus or minus 2.5%.

And they designed it using the exact same approach that we just did here.

They figured out how many people they needed to poll to

get the margin of error within that 3% or 2.5% range.

So in summary in order to compute the margin of error for a

given sample size you will need estimates

of the standard deviation for continuous measure.

The proportion for binary outcome.

And we didn't actually do an example with incidence rates.

And the, thematically the computations are the same but they're a

little trickier because we have to also estimate the follow up time.

But what you need in order to compute a

margin error is an estimate of the incidence rate itself

for the time to event outcome and then some estimate

of the follow up time in the study you propose.

But the principle is exactly the same,

the bigger the sample size, the smaller the margin of error.

So these estimates of these aforementioned

qualities, of these aforementioned quantities come from

either a small pilot study, secondary results

from other research, or educated guess work.

But for single population level quantity, it's pretty straight forward if

you have the appropriate estimate, to see what the margin of error

looks like as a function of your sample size.

In the next section, we'll extend this reasoning to looking at some re-measures

that compare to the populations, like a

mean difference or a difference of proportions.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.