Those same matrices will minimize the root mean squared error and
sum of squared error is a much easier formula to work with.
So we're going to minimize the sum of the squared error.
Now if you recall our prediction rule.
The score of item i for a particular user,
which I'm going to abbreviate here as r tilde ui,
to say the prediction of the rating,
is our baseline value plus the sum or the dot product of the user and
item feature vectors for that user in that item.
So we compute the error by subtracting the prediction from
the rating which is r sub ui- b sub ui minus this dot product.
But then the updates, gradient descent is based on the rule, That theta,
so we're going to, it's common to call the parameters of one of these models theta.
So theta is P and Q together, the two matrices.
That theta at step n equals
theta at step n- 1 plus
the gradient of our error
with respect of theta.
And the gradient is just this big matrix of partial derivatives.
So we're trying to care,
though we're training individual user item feature values that are timed.
So we've got a particular rating, r sub ui.
It has a particular item.
It has a particular user, a particular item.
We're also training one feature at a time.
So we train the first feature.
Then we train the second feature.
So we're training at feature f.
So we're trying to update P sub uf, and Q sub if.
The user and item feature values for that particular feature.
So we really only have two values we care about at any step.
We're using something called stochastic gradient descent in FunkSVD,
which means we're updating for every rating.
Rather than going over all the ratings and computing a big update matrix.
We're just automatically upgrading with every rating.
So at any given point, any given step through the algorithm,
we care about these two values.
We're trying to update these two values.
So how do we do that?
We want to take the derivative.
The derivative with respect
to Puf of the squared error,
or of epsilon sub ui squared.
So if you've taken calculus and
remember your derivative rules for
dealing with powers, that's going to
be equal to 2 times epsilon sub ui,
times the derivative with respect
to P sub uf of epsilon sub ui.