0:01

So now we need to actually quantify the similarity, right?

because we need to be able to use it somehow.

And so we're going to look a little bit at how we do that.

this is going to be the most math that you will see in the course, the most in

this slide, and the next slide, the most math you'll see.

And we'll never show this much math again so just bear with us.

Okay, so now suppose you took each of these movies, movies one and movies two,

and we plotted them. Right, so we plotted the ratings for the

0:40

A, coming along this way, and we have another user, B, coming along this way.

Right, so, if user A liked movie one, we would give it a positive rating which

means that we're going to extend positive this way, didn't like it, we would give a

negative rating, 'kay. Similarly, if user B liked movie one, we

would go up positive and if he didn't like movie two, we'd go negative.

And then we define these different points one for each movie one and two, so we

could have movie three, four, five and so on and then each of the dimensions in the

graph are a different user, so its a given idea, okay.

So now these values here have, are some indication of the rating that the user

gave the values, okay. And we'll talk exactly what it is in a

second. but the idea here is, we want to compare

these two lines and see how close they are to one another.

Okay? Because if they're really close, like in

this case right here, that we've drawn, they are really close.

Right? So, user A liked movie one so he rated it

positive. User A also liked movie two, so he rated

that positive. User B liked movie one, so he rated that

positive. And user B also liked movie two, so he

rated that granted more positive. But it is the same.

The vectors for movies one, and movies two are both positive wherein they're

both close to one another. So when we look at the angle between the

vectors, this angle right here is small because they're pointing in the same

direction. What that means is that the users tend to

respond the same to those movies because they're pointing in the same direction.

Similarly if we had the angles like this. It would be the same idea.

What all this is saying is that one user responded negatively to both movies but,

still the same, idea applies that if a user responds negatively to one movie

it's going to respond negatively to the other.

And so on. It's just saying the movies are similar

in taste . Now, here's another example.

Okay we're just moving now. We're keeping movie one over here and

we're moving movie two over here. 'Kay.

So now User A had a positive response to movie one, negative response to movie

two. User B had a positive response to movie

one, and also a positive response to movie two.

Now this angle's getting a little larger here, and you can see.

So now, there is not any same directionality here because the, the

users responded differently, okay. So, user one liked this movie, didn't

like this movie, user two likes this movie and likes this movie.

So, for user two there seemed to have been a positive correlation on his view

taste, but for user one seem to have been negative correlation.

So, right here there is really no correlation at all.

3:29

Here we have the same thing, it's just now rather than moving User Two's taste,

we're going to move User One's taste. So User B, not User Two, User One.

User B liked this movie, didn't like this movie.

User A liked both movies. Okay, so you see the idea again is that

the users are each responding differently.

So we can't find a correlation among movie tastes, okay.

We can't say that if one user likes this movie he's not going to like this movie

or if one user likes this movie he's going to like this movie because they

don't go in a similar direction. Now here's the other extreme.

Okay is that in each case now user A likes this movie doesn't like this movie.

User B likes this movie doesn't like this movie because they're pointing opposite

directions. Now this is a very dissimilar situation

so when one user likes one movie he tends to not to like the other movie this angle

is larger and it's getting closer to 180 degrees.

Okay. So this is a very dissimilar situation.

And this is a very similar situation. This is positive correlation.

Strong positive correlation, this is strong negative correlation.

This is really no correlation, or none, we'll write.

So we want the angle to either be close to zero degrees indicating if there's

strong positive correlation, meaning that when one user tends to like one movie he

will like the other one. Or we want it negative, which means

there's a strong negative correlation, which means that when one user likes one

movie, he will not like the other. Or if he doesn't like one movie then

he'll tend to like the other. Now the way that we quantify this, okay,

is by taking the cosine of this angle in here.

And so, we don't have to explain geometrically how you get the cosine,

we'll just illustrate it intuitively here.

The way that we get the cosine of that angle, okay, the cosine is going to be

close to plus 1 if the angle is close to 0 degrees.

5:34

It's going to be closer to zero if the angle is around 90 degrees or like in

these situations right here, it would have zero, like zero correlation.

And it's going to be close to minus one if the angle's getting close to 180

degrees. Okay.

Like in this situation right here. This would be close to minus 1, this is

close to plus 1. So now, the way that we calculate the

cosine similarity, okay, is by basically multiplying a user's preferences for each

of the movies together and adding those up.

Okay. So basically what we would take is we

would take A1 times A11 would be 2. Okay.

Add B1 times B11. And then we actually divide by the

length. So we divide by the length of each of

these segments, okay. And you don't really have to.

Know that part as much. It's not as important but we need to

normalize the value, to between zero and one.

Like it is here, or sorry, to be between minus 1 and plus 1, like it is right

here. So we want it within this range.

And so we divide by the length of the lines basically to get that to be the

case. So we divide by the square root of A1

squared plus B1 squared. Right, 'cause remember the movies form

the lines. [SOUND] And then sorry, times the square

root of A2 squared plus B2 squared. And if we had more users, you'd, you

would just add more terms, right. You just A1 plus A2.

A1 times A2 plus B1 times B2. Plus C1 times C2, and so on.

That was just be A1, B1, C1 here. A2, B2, C2 over here.

So it's just a simple extension, getting more users.

7:31

But now the intuition is as follows. Okay, if these are pointing in opposite

directions. So if this one is positive, this one is

negative, suppose. Then this product is going to be

negative. And same thing here, if this one's

positive, and this one's negative, then this product is going to be negative.

Right, so now this is going to be a very negative sum, because we're adding two

negative numbers. So it's going to get closer to this minus

1 down here. If these are both positive, on the other

hand, then it's going to be very positive, so we're going to go up here.

If one of them is positive and one of them is negative then we're going to be

getting closer to zero because there is not as much correlation.

Right, so both of these are positive, if all four of these, for instance, are

positive, then we're good. Even if all four of them are negative,

then we'd be good, because negative times negative makes a positive.

Negative times negative over here also makes them positive, and so on.

We just don't want one term to be positive and one term to be negative.

'Cuz then in that case, there's no correlation.

Right? Like up here, we had, we would have this

one term being positive and the other term being negative.

8:49

So now correlation, as we said, it could be near plus one which means it's strong

and positive, near zero which means there's no correlation or near minus one

which means it's strong and it's negative.

So the key idea that you should really take away here is how to find.

This cosine value, and what it means, right.

If it's close to plus one it means they're, strongly positive correlated.

If it's close to negative one it means it's strongly negative correlated.

And if it's close to zero it's means there's really no correlation.