[MUSIC] So this leads us to consider something called Weighted K-nearest Neighbors. And what weighted k-nearest neighbors does, is it down weights the neighbors that are further from the specific query point, or target point. And the effect that has is as we're shifting from target point to target point, when a neighbor jumps in or out of our set of nearest neighbors, the effect of that isn't as significant because when I'm computing what my prediction is at a given value. I've already down weighted that neighbor so when it drops out, it was not that signficant in forming my predicted value to start with. So, in particular, when we talk about our predicted value, we're gonna weight each one of our nearest neighbor observations by sum amount that I'll call C subscript Q, for looking at query point Q, and then I also have a nearest neighbor j, indexing which nearest neighbor that weight is associated with and the question is how do we wanna define these weights? Well, if the distance between my query point and its nearest neighbor is really large and we want this constant that we're multiplying by to be very small because we wanna down weight that observation. But in contrast, if the distance is very small, meaning this house is very similar to my house, I want that constant to be larger. So one simple choice is simply to take our weights to be equal to 1 over the distance between these points. So Let's say that C, Q, nearest neighbor, J, is gonna be equal to one over the distance between XJ and XQ. So this will have the effect that when the distance is larger the weight is smaller. When the distance is smaller the weight is larger. More generally we can think of defining these weights using something that's called a kernel where here in this light we're gonna focus on the simplest case of what are called isotropic kernels. And these kernels are just the function of the distance between any point and the target point, the query X Q. And in this figure we're showing a couple examples of commonly used kernels. What we see is this kernel is defining how the weights are gonna decay, if at all, as a function of the distance between a given point and our query point. So as an example here, one commonly used kernel is something called The Gaussian kernel. And what we see is that this kernel takes the distance between XI and XQ, looks at the square of that distance. And then divides that by a parameter of lambda and that lambda is going to define how quickly the function decays and importantly, this whole thing that I described is exponentiated so that all these weights are positive. But I wanna emphasize that this parameter lambda that all of these kernels have, that's why I've written kernel sub lambda, define how quickly we're gonna have the weights decay as a function of this distance. And another thing that's worth noting is the fact that this Gaussian kernel that I've mentioned here Has the weights never go exactly to zero, they just become very, very, very, very small as the distance increases, but for the other kernels I've shown on this plot here. There is once you get beyond this minus lambda to lambda region, the weights go exactly to zero. So what we just described was assuming that we were just looking at a single input like square feet. And thus our distance we just our standard Euclidean distance of looking at the absolute value between the two inputs. But more generally you can use the same kernels and higher dimensional spaces, where, again you just compute the distance, however you've defined it in your multi varied space, and plug that in to the kernel to define your weight. [MUSIC]