In this video on Basic Estimation, we'll continue with Stata and

look at a design that's more complex than in the previous video.

So what we'll do is look at the same sample as we used in the R example.

So I'm looking at nhis.large and

that you may remember is from the R prac tools package.

So I wrote that out in R in a DTA format which is the Stata format.

Then I use it, I clear the memory, and to do the same sort

of thing that I did in R, I'm giving labels to the age groups.

So in Stata, you do that with a label define statement,

and you define the label variable, and here I'm calling it age_lab.

So I just label the five age categories, and then I say label values for

the age, the field age_group,

in my labels are what I just created here, age_label.

So that'll associate the things and give us a nicer looking table.

So in svyset here's what I do, I specify the psu variable here,

which is called psu, the survey weight is SVYWT,

stratum variable is stratum.

And then I use tabulate and I preface that with SVY: to

let Stata that it should use the survey procedure to do the tabulating.

And I'm getting a table of age_grp by

whether people delayed medical care or not.

This row says, give me a row proportions.

So in the two way table,

it's going to make the rows sum up to one which we'll see on the next page.

Now because I didn't tell it more specific information, it's going to use

the ultimate cluster which is a without with replacement variance estimator.

Exactly the same as our survey did and

that is going to be a standard procedure in survey packages,

SAS will be the same thing, so here's the output from that.

And we see, as before, we've got 75 strata,

two PSUs per stratum, so we've got a total of 150 PSUs.

The total number of observations in the dataset is 21,464,

now that's omitting the ones with missing delayed medical care.

And in the population size that state of reports

is just the sum of the weights, so about 66 million.

And the designed degrees of freedom are the following,

this is using that rule of thumb, we pick up one degree of freedom per stratum.

Since we get a two psu per stratum design, so

we get 75 total, and here are the proportions.

So you can see that going across a row,

these things add up to one, because that's what I asked for.

So these proportions of people who did delay medical care,

the value of one here are exactly the same,

is what we estimated from our.

So here's the column and a nice thing about Stata is,

it does give you the marginal proportion.

You don't have to ask for that separately, so that's 0.0719.

And another thing that Stata does by default is it calculate

the Pearson-adjusted chi-square, which is referred to in the F distribution.

And you'll see here, the degrees of freedom numerator and denominator for

the F are exactly as computed in the R survey package.

The F stat is the same and the P value is to four decimal places, 0.

So once again, we see that, delaying medical care and

how old you are not independent variables in this data set.