[MUSIC]

Now I want to introduce multiple regression.

Multiple regression is normally the subject of a semester-long course.

And you might at some point, if you pursue your graduate training,

take such a course.

But I can try to give you some flavor of what it is right now.

So multiple regression involves relating a Y variable,

an outcome variable, to multiple X variables rather than just 1.

So the example I just gave you,

you're always looking at one variable as a function of a single other variable.

But imagine the scenario where we're actually interested in controlling for

or accounting for

the influence of multiple variables on some outcome that we're interested in.

A very good example would be looking at income.

We've talked a lot about the relationship of income and education.

But we know a lot of other factors influence income as well in

addition to education, age, years of work experience,

sex, ethnicity and all sorts of other things.

College major, you can think of all these as potential X variables,

all things that we think probably influence Y variable income.

Multiple regression is a tool for looking at the influence of all

of these variables simultaneously on some outcome variable.

So if we have in our example income as a function of years of education,

age, and years of work experience.

And we decide that X1 is years of education, then the coefficient

that we get for X1 is going to be the average change in income,

associated with a one unit change in our measure of income that

is increasing the total number of years of education by one.

Importantly, this assumes that none of the other variables are changing.

That, in this example, years of work experience and age are locked down.

So in the multiple regression context, the variable, X1,

the coefficient that we get for it measures an association,

a average change in the outcome,

among people who's education changes by one year but

whose age and then years of experience remain unchanged.

So you could think about a comparison between different people who are of

the same age, been in the workforce for the same number of years, but

differ in terms of education.

Then a coefficient for another variable, maybe it's age,

we'll call X2 age will reflect the effects of a one

year change in age on an outcome variable Y.

Again, holding constant or holding equal the values of the other variable so

the coefficient we get for X2 represents the average change

in income when we compare people whose ages differ, but

who have identical education and identical years of work experience.

So multiple regression, because of its ability to handle multiple

variables at the same time and then measure their partial effects on

an outcome is a common tool for trying to account for problems with lacking or

omitted variables like we talked about in previous lectures.

If we're trying to isolate the effects of education on income we

might try singles all of the possible things that on a one hand might

be associated with income and which my also be associated with education.

And then introduce them as additional X variables to control them so

that the effect that we observed, or at least the association that we observed,

for education and income was among people with varying levels of

education who were identical on the other variables in the model.

So for decades, this was the mainstay of a lot of quantitative analysis of

social data, to try to identify causal associations by controlling for

all of the possible confounding, or omitted variables.

Now, of course, for this to be successful, you actually need to be able to measure

all of those omitted variables and it's not always possible.

We can always speculate about some other variables that might be out there that in

fact no survey has yet measured, or which are impossible to measure.

And so if you go on to advance studies, you'll learn about more advanced

forms of regression to account for unobserved variables.

Variables that we can't observe at all, but

where people in our study might share some characteristics.

So these are fixed or random effects models.

You'll also learn about other extensions to regression to account for

partial cases like the situation where perhaps instead of trying to predict

the behavior of a outcome variable that is continuous like income.

In fact, we want to do with the prediction of which categories some outcome is.

So going beyond the tabulations that we talked about in the previous module.

There are in fact regression based approaches for modeling,

which category people may end up in, as a function of both categorical and

continuous variables on the right hand side.

So there's a lot to learn if you want to make use of regression.

I hope this has given you at least a little bit of a taste,

of what a regression coefficient actually means and what a correlation

coefficient actually means enough to help you a bit as you read papers and so

forth where people talk about regression coefficients.

But you'll have to take additional courses if you really want to learn how to use

these techniques properly.