0:06

In fact, we're going to talk about probability sampling for three modules.

In this module, we'll talk about simple random sampling.

Then we move to clustered sampling in the next module, and then stratified sampling.

They're all related.

And be clarifying the differences between them in the coming modules.

0:26

What is probability sampling?

From the early 20th century, probably-based or

random sampling has been advocated as an alternative to purposive sampling,

quota sampling and some of the other techniques that we talked about that were

used in the late 19th century or early 20th century.

1:17

The population, we'll be talking about that a lot.

By population,

we mean the entity about which we are going to generalize from our sample.

So a population could consist of people.

It could be registered voters, it could be the residents of a particular country, or

it could be a population of firms or other organizations.

It's basically the larger set of units from which we intend to draw our sample,

and about which we want to make a statement.

2:19

A sampling frame is a complete

list of the members of the population from which the sample will be drawn.

So whereas the population is an abstraction, registered voters,

the citizens of a particular country, the residents of a particular country,

a sampling frame is an actual list that will be the basis of our sample.

It's an actual concrete or real thing that we work with.

3:03

There's households.

So that might be obtained, as a sampling frame, as a list of residential

addresses obtained from the post office or some other government agency.

Or perhaps a utility company that provides service to

all of the households in a particular area.

Telephone users.

A sampling frame might be a list of active phone numbers obtained

from a phone company.

3:42

Firms in an industry.

We might make use for

our sampling frame of a list of members of a trade association.

Or a list of companies that have registered with the government in

connection with doing business in a particular area.

Now, I want to talk about simple random sampling.

This refers to the case where every unit in the population is equally likely to be

selected for the sample.

4:10

Measurements in the sample provide direct estimates of population parameters.

So if we have a sample of registered voters,

and it's been drawn from the population of registered voters, we have a good sampling

frame consisting of a list of registered voters, the proportion that we measure

in the sample should be an estimate of the same proportion in the larger population.

4:48

And then one issue is that if we are going to carry out a survey

with a sample based on simple random sampling,

it's normally necessary that contacting respondents needs to be straightforward.

That's, of course, easy with a mail survey or a telephone survey.

It can actually get more difficult if we're thinking about a household survey

that includes in-person visits.

5:13

So one easy example, something that we can do with random sampling,

would be a household survey via mail,

where a survey is mailed to a sample of residential addresses.

We get sampling frame consisting of a list of all valid residential addresses in

a particular city, and we pick a certain number of them at random,

and we mail out a household survey.

That's straightforward.

What's the procedure?

Well, first we have to obtain a sampling frame.

Now, that can actually be one of the most difficult parts of conducting a survey.

It's fairly straightforward for certain things, like household surveys,

where we can get lists of valid residential addresses, or

surveys of voters, where we have lists of registered voters.

But it can be much more difficult for more specialized populations.

Professors, people working in a particular profession.

Or people that are actually trying to hide themselves, or

perhaps engaging in a behavior that they haven't made public, and

where there may not be a comprehensive list.

We'll talk about some of those issues in a later lecture.

Once we have our sampling frame,

we randomly select units from the frame to make up our sample.

This may be done with software.

So we can program a computer to generate random numbers, and use that to

pick the units within the sampling frame that will be part of our sample.

It may also be done by going down a list, if we

can come up with a comprehensive list of every element within our sampling frame,

for example, a complete list of all presidential addresses.

And then we can just select units at intervals defined by the ratio of

the population size to the intended sample size.

So, for example, if a list has 100,000 addresses,

and our intended sample size is 1,000, that is, 1 out of 100, then

we could go through our list of addresses and simply select every 100th address.

7:16

Let's work through a simple example.

So imagine that we have a complete list of addresses for a city.

And here we have an extract which consists of 21 addresses on Main Street,

including some apartment buildings, so apartment 1, 2, 3, 4.

Now, if we wanted to construct a sample that consisted

of one out of every four households in the city,

7:47

we could actually number the addresses,

1,2,3,4 1,2,3,4 and so forth, as a first step to drawing the sample.

7:57

So here we've done that numbering, 1,2,3,4, 1,2,3,4, etc.

So once we have that in place, we can simply go ahead and

select every fourth address like this.

We started with an offset of two, and then picked every fourth address after that.

And it turns out that that would produce a random sample

of addresses that consisted of one-fourth,

or one out of four, of the addresses in the city.

8:59

More statistical power reduces the chances of failing to observe a relationship or

a difference that actually does exist in the population.

We refer to that sort of mistake or error as a Type II error.

It depends mainly on the strength of the relationship

that's actually in the population, the sample size, and then the criterion for

statistical significance that we're going to set.

So if we have a stringent criterion for statistical significance then,

in that case, we are probably going to set, we're going to need,

a large sample to get the statistical power that we need.

9:40

Sample size as a share of the percentage of the population is

relatively unimportant.

So that's why typically, even for very large countries, like say, China,

typical surveys may just have a sample of 5 or 10,000 people.

Surveys are not much larger than they would be for the United States, or

even a much smaller country, because you don't get much bang for the buck by

shooting for a particular percentage of the population making up your sample.

Statistical power is driven by the size, the absolute size of the sample,

the number of cases.

10:15

So I'm going to review and

talk a little bit about the advantages of probability sampling.

It's representative on all characteristics.

So when we have a genuinely random sample from a population,

whatever we measure in our sample will generalize to the larger population.

10:38

There's no discretion on the part of the interviewer in terms of picking who they

want to interview, at least not if they've been trained properly.

Now, there are issues with response rates, and

so forth, that we'll talk about in a later module.

But if everything goes as planned, the interviewer doesn't get to pick and

choose the way they would with a quota or purposive sampling approach.

11:07

Now, probability sampling can include some challenges.

Sometimes it's hard to find sampling frames, especially if we're looking for

a more specialized population,

a subset, consisting of people in a particular profession,

people with particular interests, or engaged in a particular hobby.

Non-response can be an issue.

We'll come back to this in a later module.

And, then, it can be logistically difficult and

expensive to have a simple random sample over a large geographic area.

And we'll talk about that in the next module when we talk about multi-stage

cluster sampling, which is one remedy.

11:45

Now, I want to come back to the issue of sample size versus representativeness to

highlight it.

And I want to emphasize that large sample sizes do not compensate for

problems with representativeness, that basically a small

representative sample is always preferred over a large, unrepresentative one.

So that's why, when you look at typical surveys done for research,

they rarely have more than a few thousand respondents.

As long as the sampling is done properly, a few thousand respondents will give you

a good insight into the population that you're trying to study.

Whereas, perhaps online surveys, mail surveys that are done in an ad

hoc fashion that may have hundreds of thousands of responses,

are rarely used in serous research because it's not clear that they

are a sample that's actually representative of the larger population.