1:01

The real question that arose was, what can we do to improve matrix factorization

techniques to bring back, what I'm loosely going to call content attributes or

you could even call meta data.

We know that content preferences are real and might help improve performance.

We also know that one of the weaknesses of matrix factorization is that you end up

with really hard to understand profiles, that the dimensions of

the factorized rating matrix are rarely understandable.

And so you were telling me about some ideas for, how we can think about this

problem a little bit more generally, why don't you lead us through those ideas?

>> Sure, yea.

So when you think of the basic problem typically mostly

people think about the user versus movie.

There's a ratings matrix and that's how your data looks like.

But if you really look at the data more carefully, you'll realize that you'll

probably have some demographic information or other information.

For the individual users you'll have information related to the movies,

like the cast, the reviews, and other things, right?

And if you just focus on the user versus movie ratings matrix,

that one data matrix, you are missing out on these other attributes,

the site information and the content that is there.

So a more general way to sort of thing about is simultaneously take

into account all these multiple dimensions as you are doing the factorization.

And of course, the data looks much more complicated.

So as you can see sort of in this schematic diagram.

The yellow thing is your usual user versus movies ratings matrix.

But the users are sort of at the bottom and

they have demographic information jetting out in blue.

And then the movies are along the columns, and then so

there's reviews associated with the movies and there's movie cast and so on.

So this is how your data really looks like, so

the goal really is instead of just focusing on that yellow matrix.

Can we take into account all of these as we are trying to do the matrix

factorization?

And that's sort of this multiple dimensions, and

doing a collective matrix factorization.

>> And so then, we're going to start by modeling

the way these various dimensions come together.

Why don't you walk us through this in a graph representation?

>> Right, so one way to think about this kind of a data is on the left hand side

we have some things like viewers, actors, movies and review words.

So this is sort of what will be used for the reviews.

So these are the entities in some ways.

And the relationship between sort of a viewer and

a movie is a ratings matrix right.

And now if you pay attention that if a viewer is actually rating

individual actors in a movie, that will be a viewer actor movie ratings.

It will be multiple matrices, where every ratings matrix corresponds to one actor.

Now, if you look at the costing matrix,

that's a relationship between the actors and the movies.

So, that’s a matrix connecting movies with the actors participating in it.

Then movies and review words are connected by their reviews,

viewers are connected to demographics.

So, this relation graph essentially with entities on the left-hand side and

relationships on the right-hand side.

Sort of, and you can draw it based on what kind of data you have.

So that it's fully laid out, you know what kinds of things you have.

The righthand side things are usually matrices, the lefthand side thing

are indices of those matrices of the sides of the matrices.

>> So just as an example, if we were going to add in genres.

We could have those as entities and we might have a relationship that movies have

genres but we also might have that viewers have expressed a preference for genres.

>> Exactly, that's a perfect example.

So we'll have one entity on the left hand side which will be genres and

the movie and genre will connect to that matrix with just.

And similarly for viewers, we can have that as well.

>> Right, and as you pointed out,

this can allow some fairly sophisticated relationships.

So, one of the things that our typical systems don't allow is saying g I

liked Sandra Bullock in a particular movie and

of course I'm in the blind side but I didn't like her in a different movie.

>> Sure.

>> And that type of deep rating relationship of

course requires some challenging interfaces to get people to express it.

But if you're mining reviews, you may very well get that data as a byproduct.

>> Yes, and that'll be sort of a very rich data.

So, in this case, it'll be the viewers are actually rating the individual

actors in movies and that's difficult to do, but

they often express those thoughts in their reviews as well and we can extract that.

>> Wonderful.

So before we actually go through this graph,

one of the things we're going to talk about is that you've actually done this.

And probably be useful to take a few minutes to just talk through,

what does it mean to take multiple matrices like these and

factor them together so as to come back with a better model?

>> Right, so what we usually do is that is let's say you have an entity like a movie.

And then when you're looking at just the movie versus user matrix,

the ratings matrix, you get a latent factor or representation for each movie.

Now when you have multiple such entities and relationships,

the movies will still have this latent representation

like as a vector as you said those dimensions are hard to interpret.

But every movie will have sort of a maybe five dimensional vector representation.

7:51

That's nice because it's optimal but

it only counts two dimensions worth of information.

If we now come out and say well what I have is users to movies and

then cast to movies and we co-factor these.

We're going to come up with different set of dimensions that allows

both matrices to share the latent dimensions so

that a movie in one has the same vector approximately.

>> Yeah.

>> Has a movie in the other.

>> Yeah.

>> It sounds like part of what we're going to end up with is

a dimensionalization that's not as perfect for either matrix alone but that

jointly is better and gives us the ability to bring those two matrices together.

>> Exactly that's perfectly correct.

And we simultaneously approximating both these matrices and that representation for

every movie will be different for had we only looked at one of those matrices.

>> So one of the questions obviously is, so does it work?

So why don't you tell us how it works?

>> Yeah. So I think in the literature now,

there are many different models trying to do this.

What we have shown in the plot over here.

So our comparisons between PMF.

Which stands for probabilistic matrix factorization.

9:08

This is the paper that became popular during the Netflix competition,

the PMF paper.

And GPMF is a variant that we cooked up, which is a generalized version of

PMF which can take into account these additional dimensions and factors.

So in this particular comparison, as we are increasing the rank.

We are showing how the performance improves where you have,

just like Joe said, that you have the ratings matrix.

And you also have the movie versus cast matrix and

you're trying to sort of capture that.

So as you give more rank,

as you increase the rank you can see that the performance of this more fancy model.

The GPMF actually keeps improving because it can take advantage of this additional

information that's available in the cast and it can align that and

can understand what types of movies the user likes.

What kinds of people work in similar types of movies and so on.

So there are been sort of a parallel development in machine learning,

data mining, and other areas.

So if you look for things like collective matrix factorization,

there's a whole bunch of models that go by that name.

Which try to accomplish similar things.

And you should be able to find, you know, enough ideas on how to this can be done.

But this is just one such idea that we worked on.

And it definitely showed improvement.

>> Well it's neat, because it shows both things on the same graph.

It shows one that this is always better.

Even though we're not doing the single perfect metrics factorization,

having more data gives us a better result.

>> Right. >> But what we're also showing is that

the total amount of information in the system Is increasing because

we're able to take advantage of more dimensions.

>> Right. >> And

obviously what PMF is showing is it gets worse is that it’s over fitting.

>> Yes, exactly.

>> Because it's used all the information it could possibly use.

>> Exactly, its sees the [INAUDIBLE].

>> So the one other thing that comes out nicely from this is a little bit better

hope of having some forms of content understanding and bring up both of these.

From your work you had talked about how you could find cast clusters by

identifying clusters of cast members.

That had nearby vector representations, >> Right.

>> Of this representation.

>> Right.

>> Can you tell us a little bit about that?

>> Sure, yeah, so

in the example that we are talking about you have a user by movie matrix.

Which is like usual ratings matrix and the movie by cast matrix,

which is sort of the new thing we threw in and

then we are sort of looking at the joined factorization.

So you are going to get this latent factor representation for every actor or actress.

And then you can cluster them based on these five dimensional or

ten dimensional representation.

You can do just run around something like [INAUDIBLE] or

do something more fancy to find clusters among the actors.

So we looked at some of the results and some of it made sense so

we found one cluster where mostly the science fiction actors got together

a lot of Star Trek, you know Apollo 13.

Ed Harris and Nimoy and so on.

All of them grouped together.

We found another cluster which is largely actors in the 40s through 60s.

You know, Cary Grant, Humphrey Bogart and people like them.

So so some of these actually,

if you carefully start poring over the results, they make more sense and

there's a better interpretability as to what may be going on over here.

>> Well, part of the thing that makes this kind of analysis neat is that

it can sometimes help you either back or refute intuitions.

And so- >> Yeah.

>> You look at the bottom there and

Paul Newman actually did most of his acting after many of those

people had stopped. >> Right.

>> But there's this sense that he was a throwback to the type of actor

of an earlier era.

And what we can see is that if you look at the data of the movies

that are liked by people and the actors that are in them.

When we bring all of this together sure enough.

He sits there along side Cary Grant and

Humphrey Bogart as opposed to some of the actors who might have been later and

part of a different generation stylistically.

>> Yeah, that's true.

>> And so none of this necessarily means that you have

an easy time again describing what the dimensions are.

>> That is true.

>> That's always going to be a challenge with matrix factorization techniques.

But you might have an easier time explaining

some attributes of a movie that somebody would like because

you can express it in terms of some of these content spaces.

>> That is true and in some ways these clusters are post processed versions of

those latent factors.

So we did a clustering on those factors and then this is somewhat interpretable,

but you're right that the dimensions of those latent factors are still difficult

to interpret.

>> So one last question, where is this type of technology in

terms of the continuum from an idea in the research lab,

out to everybody uses it in all of their systems today?

>> So my sense is, and this is more coming from academia is that

this has been explored quite extensively and there's lots of ideas out there.

I think one challenge in adopting this fully Is that,

first of all, as you throw in more types of entities.

The data sparsity problem is exacerbated.

You have just many more of these large sparse matrices.

And then the scale of the problem increases.

So you have to sort of come up with tractable algorithms which scale to

the right things.

>> So I think that's where things are,

people are navigating their way through these, is my understanding.

>> So it sounds like it's a technology to keep an eye on.

>> Yeah.

>> And it wouldn't be surprising if we start seeing some of the industry

leaders jumping into these ideas,

to try to ratchet their recommenders up a little notch.

>> Yeah, I think so.

>> Well, wonderful.

Thank you so much for joining us.

>> Thanks for having me.

>> And we'll see you again soon.