0:14

And what we ultimately want to do is we want to group those variables together,

Â those survey items together that are highly correlated with each other,

Â the ones that tend to move together.

Â Now that movement maybe in the same direction,

Â that movement maybe an opposite direction.

Â But the assumption that we're going to make is that

Â items that tend to move together, there's some underlying construct.

Â There's some high order belief that consumers have or some set of preferences

Â that they have that cause all of those survey items to move together.

Â And if we can identify those underlying beliefs,

Â those constructs, those are what we're going to put into our regression analysis

Â as well as the subsequent analyses that we might conduct.

Â Now while we're doing that,

Â we want to make sure that we retain as much information as possible.

Â So in our survey, let's say we've got our 30 survey items that we're looking at.

Â We want to make that a more manageable number.

Â We want to cut that down to identify what's really driving those responses, and

Â maybe it's ultimately five constructs that are ultimately driving those 30 responses.

Â Well those five constructs,

Â that's a lot smaller than the 30 survey items that we began with.

Â And so any time that we engage with dimension reduction we are going to be

Â throwing away information.

Â Our goal is to retain as much information as possible When we're

Â conducting our analysis, all right?

Â And so here's just kind of a visual illustration of what we might be finding.

Â So suppose that there are K underlying constructs.

Â And those K constructs ultimately drive all of the responses

Â that we're collecting on the survey, all of the survey items.

Â Well, what we might see is survey item 1 might be related to constructs 1 and 2.

Â Survey item 2 might be related to construct 1.

Â Survey item 4 might be related to construct 2.

Â And so what we're going to ask factor analysis to do for us is two things.

Â First, reveal to us how many constructs are appropriate.

Â What is the appropriate number K?

Â Second, we're going to ask it to reveal which constructs and

Â which survey items are ultimately related to each other.

Â 2:25

So one of the ways that factor analysis is commonly used when it comes to analyzing

Â survey data as I had mentioned, is to group these similar items together.

Â And by similar I mean items that tend to move together.

Â Now in addition to facilitating subsequent analysis by

Â grouping similar items together,

Â reducing the number of predictors that ultimately going into subsequent analysis.

Â This can also be used in the course of designing your survey.

Â Your initial survey might have 100, 150 individual items on it.

Â And what we'd like to do is pair that down so

Â that respondents find the survey to be a little bit more manageable.

Â So maybe I can go from a 150 survey items down to 50 surveys items

Â after the first pass.

Â Well, factor analysis will help us identify which items tend to move together

Â and as such, identify which ones are potentially redundant.

Â I can eliminate those redundancies and

Â administer my survey in the second wave and continue to refine it until I

Â have a number of survey items that I'm comfortable with.

Â The other way that factor analysis gets used

Â is to produce measures that are uncorrelated with each other.

Â Multicollinearity is a big problem when it comes to regression analysis.

Â 3:41

The outputs that we get from factor analysis by design

Â are uncorrelated with each other.

Â So if we first conduct factor analysis and

Â then use the output from factor analysis as the inputs in our regression analysis,

Â we're not going to have to worry about multicollinearity.

Â And so that's one of the reasons that this is such a popular statistical technique.

Â 4:01

Right and so before we get into the mechanics and I'll illustrate it for

Â you using a particular software package, what's the basic idea?

Â Let's take our original survey data in this particular automotive example.

Â We have 30 different variables or 30 different survey items.

Â We want to see which of those items get grouped together to form these super

Â variables.

Â The first factor that we identify is going to have as much information as possible,

Â that's by design.

Â The second factor that we construct is going to have as much information

Â as possible remains after having taken into account F1.

Â And we're going to keep on adding additional factors, a third factor,

Â a fourth factor, a fifth factor and so forth until it's no longer worthwhile for

Â us to keep on adding additional factors.

Â 5:14

We're interested in modeling each of the Xs' as a function

Â of the underlying factors.

Â Now notice that the factors F1 and F2.

Â These are common across all of the survey items, right?

Â So we're going to use the same factors F1 and

Â F2 across all of the five survey items.

Â 5:33

Now, what we need to estimate are the factor loadings.

Â In this case, effectively, the coefficients.

Â Now, this looks an awful lot like linear regression.

Â But one of the differences here is when we're dealing with linear regression

Â we know we have a set of predictor variables that are known and

Â we have outcome variables that are known.

Â And all that we're trying to estimate are the coefficients, the betas.

Â In this case, we have a set of outcomes that we're interested in modeling.

Â In this particular case, it's the Xs', but we don't know what our inputs are.

Â We don't know the independent variables F1 and F2.

Â We actually have to infer those through our analysis.

Â We also don't know what the factor loadings or the coefficients are.

Â So we're trying to get both the factor loadings and

Â the common set of predictors in this case.

Â But the idea behind it is going to be in the same spirit of regression.

Â It is take a smaller number of factors and

Â use that to model the individual survey items.

Â 6:34

All right, so if we were to return to that retail example,

Â where we saw two blocks of items.

Â Remember items one and two we saw were correlated with each other.

Â Items three, four, and five were correlated with each other.

Â The way that factor analysis would handle that would be to say that items one and

Â two load onto the same construct.

Â Items three, four and five load onto the same construct.

Â All right, so

Â let's return to the automotive example where we had looked at our survey.

Â Our objective in analysing this survey is going to be to take as

Â an input the original survey responses.

Â And what we want to do is eliminate any redundancies and

Â produce a new set of predictors that can be used in our subsequent analysis so

Â the output that we're going to get are the factor loadings.

Â That's going to tell us which

Â of the original survey items tend to move together.

Â When someone says that, they're optimistic about their income being higher.

Â Well, what else do they tend to believe?

Â 7:35

So that's what the factor loadings give us, the other piece that we're going to be

Â interested in is producing a set of factor scores.

Â Think of these as a new set of predictors.

Â It's going to summarize all of the information that was contained in

Â the original survey, and we're going to be able to smaller set,

Â these New Xs variables in our subsequent analysis.

Â 7:57

All right, so just to outline the steps that we're going to be conducting and

Â the software that we use is going to help us with most of these decisions.

Â We're going to decide how many factors are necessary,

Â we're going to conduct the analysis, derive that solution.

Â An optional step that can aid with interpretation is rotating that solution

Â and we'll take a look at what exactly that means in a little bit.

Â The part that the computer can't help us with is step 4,

Â interpreting the factors or naming the factors.

Â This is where a person needs to be involved, all right?

Â We're going to look at how good a job we're doing at capturing the original

Â survey data and then we'll take a look at kind of what would our next steps be.

Â After I run this analysis, what can I do with the responses?

Â And so really we're going to be thinking about factor analysis as an input for, and

Â setting the stage for, subsequent analysis.

Â