0:14

This is the difference between covered and non covered people.

Â Easier to think of,or units when we look at this graph.

Â Let's say this blue square is our target population,

Â ideally you have the frame population be exactly the same size and

Â everything is fine, no coverage error.

Â But often the frame sits slightly off

Â at the covered population is really just a part of it.

Â What can this be?

Â Let's say we are interested in all people in the United States above

Â 0:48

age 18 in households.

Â It used to be the case that we could fuse list of phone numbers at least for

Â a little while, when phone books were good and had complete coverage of

Â a target population, at least all households with a phone.

Â But increasingly that is a problem,

Â the telephone frame does not necessarily cover the entire population.

Â There are ineligible units.

Â Phone numbers that belong to different establishments or

Â are empty or they may even multiple cases.

Â And then there are many cases that are not covered

Â by that particular telephone frame.

Â And just think of cell phones.

Â They certainly would not be or often are not, part of listed telephone numbers.

Â But other settings would have similar issues here.

Â So the total survey population can be divided in those covered and

Â not covered by the frame.

Â Again if we look at this in equation notation,

Â we have now subscript C and U for covered and under covered.

Â Those are the two pieces and each of them is a proportion, so

Â the cover divided by the total end, that's the fraction of covered people,

Â the under covered divided by total, n is the fraction of under covered people.

Â And multiplying that by the average value,

Â that is of interest to us for the covered and the under covered respectively.

Â The sum of these two gives us the average value for Y, for all of our cases N.

Â You can rewrite that and you will see that there is a error coming here.

Â A difference between Y bar subscript c and

Â y bar subscript n and the area that you see here is the difference

Â between the covered and the uncovered, on that particular Y variable

Â 2:43

multiplied by the faction of under covered over total N.

Â So the proportion of under covered people times the actual value.

Â So we can think of this as an undercoverage rate.

Â And the difference between the means for the covered and the uncovered cases.

Â 3:02

So, how big that bias is depends on two elements,

Â or can be influenced by two elements, the rate and that difference.

Â If the difference is zero, then the rate doesn't matter.

Â If the difference is large, then even a small rate will cause a problem.

Â 3:24

interesting to think of sampling bias was a sampling variance, they are two.

Â Sampling variance is just the pure variation from one

Â realization of a sample due to slightly different cases

Â being in each of the samples that are sampled from the population.

Â This is most commonly measured in statistics.

Â In surveys, confidence intervals and

Â standard errors give us a quantification for that source.

Â Sampling bias however, appears when I have a consistent failure

Â to estimate a proportion of the population.

Â Right?

Â So this would be a portion of the population, like military.

Â 4:30

have in my survey and on average,

Â the mean of the means of each of these sampling distribution,

Â will give me the correct mean value that I have in the population.

Â So there's variation by age, and you've probably

Â seen this when you covered central limit theory in your Stats 101 course.

Â We have a little segment just showing animations for this particular

Â piece but If you already know all of that then of course, no need to look at that.

Â 5:20

Sampling bias however, is people that are on the frame, but for

Â some reason have a zero probability to be selected.

Â A very important distinction here.

Â And that of course would create a sampling bias.

Â 5:38

Non-response error is then the step between sampling and respondent.

Â The values of the statistic that can be computed out based on the respondent data,

Â and that can differ from the entire sample if we have missing data.

Â Missing data can come in two forms.

Â We can either miss entire units.

Â Non-respondents, so if you think of this

Â PC of this picture here being as our entire set of sampling cases and

Â in each row you have the values for each individual on a particular item.

Â There are some values on the frame data available for everybody, and

Â then we have interview data.

Â But those are only available for the respondents.

Â The non respondents are missing entire units are missing.

Â The interview data can have missing values as well.

Â Those we call item missing data.

Â And that can be a measurement problem.

Â So this arrow actually expands both of these graphs.

Â 6:34

Just like we saw for coverage error, the non response error can be thought of

Â the total sample as being divided into respondents and non respondents.

Â And for each of them,

Â you have an average value of the variable that is of interest to you.

Â You can form a ratio of respondents over everybody.

Â And the non respondents, denoted here with m

Â 6:56

in subscript s which is the sample that we're dealing with.

Â And so the non responsive rate together with the difference between the means

Â of the respondents and the means of the non respondents,

Â gives us a sense of non response error.

Â And finally there's adjustment error.

Â So although post stratification or

Â other forms of adjustment are supposed to correct any problem.

Â This correction can be done erroneously and therefore, create error itself.

Â 7:28

Key notions from this segment that you should be learning, variable errors and

Â systematic errors.

Â Two concepts that should be clear in your mind and we hope that the quizzes that we

Â have, and other material in the readings, will help you fully understand that issue.

Â 7:53

Mind you, there are no good or bad surveys.

Â There's only a good or

Â bad survey statistic, so the errors are a property of a statistic.

Â So for single variable or a model, a particular estimate,

Â a mean, a proportion, or a regression coefficient, for example.

Â That means that you can have, for the same survey,

Â some statistics with a large error, and others with a very small error.

Â