So now we actually use the, similarity values, before we do that, we have to
calculate the similarity values. And, when we actually do the similarities
we do them on the errors,um, that we take obtained in the baseline prediction.
We basically subtract out to figure out how off baseline prediction was, just for
the training set again, 'cause we can only use the training set.
And we augment baseline predictor with the similarity values that we obtained on
the errors. and so there's a couple reasons why do
that. First is that we really need to center
these values. At zero.
and the reason I've set it at zero is that we're kind of subtracting any, any
bias out already, by the fact that we're going to an error.
So the errors are going to include zero. And in a sense, they'll be centered
around zero. And, in order to do correlation, you need
things to be centered about zero. So then you can either have positive or
negative values, so if we did the money ratings themselves, which are from 1 to
5. You know we can never have negative
values, we'd only have positives, and that doesn't give us that sign
differential that we need. So we,we subtract down and we do them
from kind of around 0 to some Positives and negatives, and second reasons that
we're really, we're trying to correct for the errors here, right?
So this is an augmenting the baseline predictor with this, similarity.
the neighborhood method, really it's called.
And, so we, we want to correct for those errors, so we don't really want to do it
on the errors themselves because that's what we want to, sort of to go away.
And I mean we're not just going to add you know the errors back and give us zero
error on everything. We're going to do it in a way that makes
sense and is not reverse engineering. We'll see how we do that in a minute, but
first let's try to calculate some of the values.
So, here I'm showing the table the table that is.
So here I'm showing the table of error values.
And I've got this again, I got this by subtracting the predictions from the
actual values. So I took the actual values and I
subtracted the predictions. So if the predictions were higher than
the actual values, then this is going to be negative which means, well, we should
have made the prediction lower than we had it.
And if the predictions are less than the actual values, then this will be
positive, which means that the prediction should have been higher.
And so we can use the positive as a negative values accordingly.
but now for movies one and two let's just just apply that cosine similarity.
to do that we have to just figure out how the users that have rated both movies
because we can't use in the equation we can't have users that have only rated
one. And clearly we can't use those that have
rated none, none of the movies. but for instance we can't use A here,
because A has not rated movie two. we can use B, we're good because B has
rated both movies. we can't use C because, well this is part
of the, test set again. So we can't use the test set.
we can't use D, because this is part of the test set.
We can't E because we don't know it's value.
but we can use F because we know both of these.
So we have to use Only b and f for this. So, for movies one and two then, we're
just going to apply that equation that we had before.
We multiply b1 times f1. So we do -0.30 times -0.05, and then
remember we add. So we add the product of the terms, so we
do this times this plus this times this So then we add .17 times -0.58 and then,
remember we have to divide by the length, and then we divide by b, square root of
b1 squared plus b2 squared. So, this is negative 0.3 squared plus
0.17 squared. And then we multiply it by negative 0.05
squared, plus negative 0.58 squared. Then, if we do this out.
We get negative 0.0220 over. 0.3041 times 0.6044.
All right, so this whole top right here comes to this.
This comes to this. And this comes to this.
And then that Is equal to negative 0.11, okay.
So the cosine, or the similarity between movies one and two, is -0.11.
Remember now, we said that we have to see whether it's closer to -1, 0, or +1.
And if it's closer To negative 1 or plus 1, then it's useful, and otherwise it's
not. And so you see, this is really, kind of,
close to 0, it's somewhere around here. so that's, that's not very useful at all.
And so these movies, we would say are reallly not, very correlated.
Now lets try movies III and V if is another example.
So we will go through again and we will see III and V now a works so we'll just
do next it will say check mark we can use a because a has rated 3,, movie III and
this greater movie V. We can use B, again because B is rated
movie three, movie five. We can't use, three because this is part
of the test set. Sorry, it's not part of the test set,
this is, value that we don't know, we can't We can use, user d right here
because these we know both these values. We cannot use e because we don't know
this as part of the test set. And, we cannot use f because this is part
of the test set. But we do have three values now, 1, 2, 3
on each of these sides, so. we have to do a little more.
There's a little more terms here. So, now, we do this the same as last
time. Which is, we have 3 terms, that we're
summing instead of just two. So we do this times this, plus this times
this, plus this times this. So we have negative 1 times negative
0.43, plus. Negative, or plus, sorry, 0.25 times
negative 0.10 plus 0.25, again, times negative 0.10.
They actually turn out to be the same, for both of those.
then we divide by the square roots of this squared plus this squared plus this
squared times the square root of this squared plus this squared plus this
squared. So we do the square root of 1 squared
plus 0.25 squared plus 0.25 squared. Times the square root of .43 squared.
Notice that I'm omitting the square root, and that's because we, when you square
something it becomes positive again. So I don't need to write this garder
rits. + .1squared + 11 squared and If we do
this, if you do this whole thing out this entire multiplication I'll leave it to
you to actually run through the calculation but you get .79 right here.
And so that's closer to plus 1 right. So it's a positive correlation and it's
kind of closer to plus 1 so we would see if these movies are positively
correlated, right. So now we can use this then we can come
up with a full table of similarity values and here I'm tabulating this.
this is a similarity between one and two you can see.
the similarities between also 3 and 5, 0.79 we just found, and so on.
And now a couple things to note. First is that this table is symmetric,
right. So we've said before that things
sometimes aren't symmetric but here they are so the similarity from one to two is
the same as the similarity from two to one.
So that's why 1 and 2 and 2 and 1 are the same, just like you take 2 and 4, for
instance, and 4 and 2, same. So you can slap, flop it over this and,
if you mirror image it over it, it will be the same.
now in, in the next in the next segment, we're going to choose one neighbor or a
movie, right. So we're having the neighbor's movies
here, for each movie we're going to choose one neighbor.
and we could choose more, we could choose two, three or four.
But we've already made the math complicated enough.
But and that would get even more complicated.
So we'll just stick to choosing one neighbor for each movie.
Which will simplifly things a lot when actually go to do it out.
And so, basically down the columns right now we'll say movie one we want to try to
find the neighbor with the highest similarity.
Right, and so that's that would be three here.
Right, and these, a lot of the backgrounds now, of green, which are the
similarities, or the ones with the highest similarities.
So one would choose three as his neighbor.
And for that reason, and again where we were finding the magnitude so we want the
magnitude to be the highest. So even though this is negative, very
negative its still really negatively correlated which is a, a useful thing and
now for 2 we are going to choose 1, we choose between point 11, point 741 and
point 88 And therefore we're going to choose this.
Even though again it's negative it's still the highest in magnitude.
For three, three would choose one because it's got negative 0.82, which is higher
than any other values in magnitude. Four would choose two just like two chose
four. Those are actually perfectly.
negative correlation [UNKNOWN] one other. So, four and two.
And the reason that they're actually perfectly negative correlation is because
there's only one value that we're multiplying here.
And so that's not really that great. I mean, normally you need more data
before you say that two things are perfectly negative correlation.
But here that's just the way it turns out.
Now five will actually choose two to be his neighbor right.
This .88 is higher than the other values. And notice twodid not choose five it
chose four. but five choose two.
So they don't have to choose each other. even though the table is symmetric, they
don't necessarily choose each other. And we could define other metrics, for
instance, we could say, all right, well I'm not going to use similarity at all
unless the similarity value is higher, than like, 0.9.
For instance. And then in that case, we would only use
one neighbor for one pair here, we wouldn't use similarity at all.
Some people do taht because sometimes it makes sense to say well, unless
similarity is high enough, I'm not going to use it.
but here we're just going to choose the most similar and just use that to do our
calculation.