[MUSIC] Okay, so, we've talked about three different measures of error. And now in this part, we're gonna talk about three different sources of error. And this is gonna lead us into a conversation of the bias variance trade-off. Okay, so when we were forming our prediction, there are three different sources of error. Noise, bias, and variance. And in this part, we're gonna walk through these three different components, at a very high level. At a more intuitive level. And then following this, there are gonna be two optional sections that go into much more formalism and detail about this. But those are optional because we're not requiring that you know this to get through the course. But for those that are interested, we will be providing the formalism behind these notions that I'm presenting now. Let's look at this first term, this noise term. And as we've mentioned many times in this specialization, data are inherently noisy. So the way the world works is that there's some true relationship between square feet and the value of a house. Or generically, between x and y. And we're representing that arbitrary relationship defined by the world, by f sub w true. Which is the notation we're using for that functional relationship. But of course that's not a perfect description between x and y. The number of square feet and the house value. There are lot of other contributing factors including other attributes of the house that are not included just in square feet or how a person feels when they go in and make a purchase of a house or a personal relationship they might have with the owners. Or lots and lots of other things that we can't ever perfectly capture with just some function between square feet and value, and so that is the noise that's inherent in this process represented by this epsilon term. So in particular for any observation yi it's the sum of this relationship between the square feet and the value plus this noise term epsilon i specific to that i house. And we've talked before about our assumption that this noise has zero mean because if it didn't that could be shoved into the f function instead. But what we haven't talked about is the spread of that noise. So at any given square feet what kind of variation and house price are we likely to see based on this type of noise that's inherent in our observations. And so this is referred to as the variance of this noise term epsilon. And this is something that's just a property of the data. We don't have control over this. This has nothing to do with our model nor our estimation procedure, it's just something that we have to deal with. And so this is called Irreducible error because it's nothing that we can reduce through choosing a better model or a better estimation procedure. Okay, so the things that we can control are bias and variance, so we're gonna focus quite heavily on those two terms. So let's start by talking about bias. And this is basically just an assessment of how well my model can fit the true relationship between x and y. So to think about this, let's think about how we get data in our data set. So here these points that we observed they're just a random snapshot of N houses that were sold and recorded and we tabulated in our data set. Well, based on that data set, we fit some function and, thinking about bias, it's intuitive to start which is a very very simple model of just a constant function, so that's what I'm gonna show here. But we fit whatever model we're specifying. But what if another set of N houses had been sold? Then we would have had a different data set that we were using. And when we went to fit our model, we would have gotten a different line. Okay. And to make this point pretty explicit, I wanna go back and look at little bit at these points that I drew here. In the first data set, I tended to draw points that were below the true relationship so they happen to have, our houses in our data set happened to have values less than what the world kind of specifies as typical. And on the right hand side I drew points that tended to lie above the line. So these are pretty extremely different data sets, but what you see is that the fits are pretty similar. So this is gonna come up later and I wanted to point this out now. Okay, let's get back to this notion of bias. So what we are saying is, over all possible data sets of size N that we might have been presented with of house sales, what do we expect our fit to look like? So for one data set of size N we get this fit. Here's another dataset. Here's another data set. Or the fits associated with those data sets. And of course there's a continuum of possible fits we might have gotten. And for all those possible fits, here this dashed green line represents our average fit, averaged over all those fits weighted by how likely they were to have appeared. Okay, so now we can start talking about bias. What bias is, is it's the difference between this average fit and the true function, f true. Okay, so, that's what this equation shows here, and we're seeing this with this gray shaded region. That's the difference between the true function and our average fit. And so intuitively what bias is saying is, is our model flexible enough to on average be able to capture the true relationship between square feet and house value. And what we see is that for this very simple constant model, this low complexity model has high bias. It's not flexible enough to have a good approximation to the true relationship. And because of these differences, because of this bias, this leads to errors in our prediction. [MUSIC]