0:00

Hi. In this lecture, we're going to talk about. Fitting lines today. Remember in

Â the last lecture we talked about simple linear models. Well, when I was drawing

Â those lines through the data the question is how do you do it. How do you draw the

Â best possible line through the data? That's the focus of this lecture so let's

Â step back for a second. Remember we need the categorical model. We had that notion

Â of r squared, which was the percentage of variation that you explained. So there's a

Â lot of variation in your data, you can start to model and you ask. What percent

Â did you explain? So, for example, if I have a bunch of data like I have here,

Â these are all the dots, right, these data. If I just took the mean of this right

Â here, and then asked, how much variation? I'd have to take the distance from all

Â these points, [inaudible], there'd be a lot of variation. When I draw the line

Â through here, I explain a lot of it. And in fact, here, it says I explain 87.2%. So

Â what you wanna do is you wanna talk about, how do you draw a line through this data

Â to explain as much variation as possible? So remember in our [inaudible] just to

Â give us a, a reminder of how this worked. It was 53,000. If it?s a total variation,

Â and you only had 5200 left. So 5200 over 53,000, which is like 9.8 percent of how

Â much we had left over. So that means that we explain 90.2 percent of the variation.

Â We wanna show how you can do that same sort of calculation with lines, and then

Â show how you draw the best possible line. So let's suppose I've got a bunch of data

Â here. To figure out how much variation there is, I draw the mean. And then I can

Â figure out the distance between let's say a six and this has a value of four. I

Â would take six, 4-6 and square that, which is four. And if this is the point as value

Â eight, I would take 8-6 and square that, and that would also be four. And I would

Â add up all those variations. That gives me the total variations. Now what I'm going

Â to do is I'm going to lie to the data and figure out. Now just what's the distance

Â from the line and ask how much of the variation did I explain from drawing the

Â line through. So, it?s a very simple example. There are three kids and they're

Â in different grades of school and they wear different size of shoes. All I'm

Â gonna do is predict shoe size as a function of the grade they are in the

Â school. So I am gonna say that if shoe size is function of what grade you are in.

Â So get a first grader who has a size one shoe, a second grader who has a size five

Â shoe and a fourth grader who has a size nine shoe. You know I'm gonna pick a

Â linear model of this. So first thing they ask is, what's the variation? So this is

Â the grade. And this is the shoe size. Now, I don't care about the variation in the

Â grades. I'm caring about the variation in the thing I'm trying to explain, which is

Â shoe size. So it's just this. So it's just this 1,5, and nine. So if I take one,

Â five, and nine, if I add those up, I get fifteen. Divide by three. I get five. So

Â the mean is five. So to get the variation I take one minus five, and square it which

Â is sixteen, five minus five and square it, that's zero, and nine minus five and

Â square it, that's also sixteen, so the total variation is 32. So what I want to

Â do is I want to write down a linear model that can explain as much of that variation

Â as possible. Let's start off with just a really simple linear model, where we just

Â assume y equals 2x. So if I take the line Y=2x, what I'm saying is all three of

Â these points should lie on the line. Right? And so, the variation is just, sort

Â of, how far off the line they lay. Well, so, how do we do it? Well, I've got X and

Â Y and 2X would be two. When X=1, 2X would be two. When X=2, 2X would be four. And

Â when X=4, 2x would be eight. So these are my predictions in a way, right? Two, four,

Â and eight. And what I can ask is, how far does the data lie from those predictions.

Â So, Here, I predicted two and the actual value is one, so I get two minus one

Â squared which is one. Here I predicted four and the actual value is five so I'm

Â gonna get four minus five squared, which is one. And here I predicted eight, right?

Â And the actual value is nine, so eight minus nine is also one, so I get one

Â squared. So, the total amount is three. So I think, wow, that's great. I started out

Â with a total variation of 32. Right? And now I've only got three, so if I want to

Â figure out my hours squared, I?ll just say that's one minus three over. Thirty-two,

Â right. And once again, I'm going to be over 90%, right. It's like 90+%, right. So

Â that's really good. I've explained a lot of the variation. But the thing is, this

Â was just like, they just made this up, this Y=2x. They just drew this line. So

Â how would I draw the best line? Well, what I can do is I can say well, let's suppose

Â I drew the line Y=MX+B, so just an arbitrary line. And then I want to ask how

Â far off would that line be from the data? Well, when X=1. My prediction would be M

Â plus B, and the actual value is one. So my error is going to be M plus B minus one

Â squared. The next equals two. My model's going to say that the value is two N. Plus

Â B. And the actual value is five. So that's going to be, my error. And when X. Equals

Â four, this is going to be my prediction. This is the actual value, so this is going

Â to be my squared error. So if I wanna know what the total error is, I just have to

Â multiply all these things out. So M+B-1 squared is gonna be M squared+2MB+B

Â squared-2M-2B+1, right? So that's a really complicated thing. And I can do that for

Â each of the other two as well, right? So I get these long equations. Now if I do

Â that, I'm gonna get, here's my total error. Right? So this is the, if I choose

Â the line, Y equals MX plus B, this is my error. What you can do and this is what's

Â great about calculus you can math and just solve for this, find the b and the m,

Â right, that make this the smallest possible number. And if you do that you

Â choose b equals minus one and you choose m equals eight thirds. So this is how you

Â draw those lines you basically go back, right, and just say well let's take any

Â line, y equals to mx plus b, figure out its distance to the data, right, add up

Â your total distance right here so this is just the total distance to the total

Â variation. Right? And then you wanna choose an M and a B that make that as

Â small as possible. And it turns out the way to do that is to choose B equals minus

Â one, and this should be a lower case m, right? M= 8/3. Now when we do that, what

Â we're gonna predict is that when, X equals one, our model now says y = 8/3 x - one,

Â right? So when x = one, we're gonna get 8/3 times one, minus one, so that's gonna

Â equal 5/3. Right. So, and the actual value's one. So if you look at the

Â difference, right, between our prediction and the actual value, it's just going to

Â be two-thirds, right. So we're going to get, their contribution to R squared is

Â going to be two-thirds squared. When we look at five, when you take X=2, the real

Â value's five. Our model, if you plug it in here, is going to give us 13/3. That's

Â also off by two-thirds. And if you look at when X=4, our model gives us 29 over

Â three. The actual value's nine, which is 27 over three. That's also up by

Â two-thirds. So what we're going to get is our total variation left over is

Â two-thirds squared plus two-thirds squared plus two-thirds squared. So that's

Â basically 4/3, that's going to be well, it's going to be four-ninths times three

Â which is 12/9, which is 4/3. So now if I want to know what's my R squared, right.

Â Well if I erase all this stuff for a second, right. What I get is that, how

Â much of the data did I explain. Right? I have one minus 4/3 over 32. And so, now

Â I've explained, you know, over 95 percent of the data. So by using, by sort of

Â figuring out what the optimal [inaudible] are, I can even do better, right, than I

Â could, like I said, trying to draw that line of Y=2X. And so if I draw that actual

Â line, it goes like this, and you see it becomes incredibly close to the data. So,

Â let's move on and think about, how do we do this with multiple variables. Supposing

Â instead of having one variable, I've got a bunch of variables. So now I can write

Â y=ax1+bx2+c. So now instead of just one independent variable, I've got two. So,

Â when you look at these things, the sign tells you does Y increase or decrease in

Â X. The other thing that Regressions will tell you is the magnitude. How much does y

Â change as a fun, as a function of x? So let me talk about why this then is so

Â important. Again, we often just reason by the seat of our pants. And so, let's

Â suppose you care about'em, again I, I'm gonna talk about this a lot, cuz it's

Â just, an easy, an easy and important thing to talk about, school quality. So got a

Â bunch of test scores from kids. And I also know this is, like, an achievement test

Â score. I've also got IQ test scores, which basically tell their innate ability on

Â some, some level. And again this is a teacher quality, and class size. Well,

Â what I can do is I can run a regression, that says well, the performance on this

Â test is going to be sum A. Because I'm coefficient on some intercept. Plus I'm

Â coefficient on teacher quality. I as in IQ, teacher quality, and class size. And

Â what you would expect is, the [inaudible] of class size to be negative. Right. You'd

Â expect the coefficient on teacher quality to be positive and the coefficient IQ to

Â be positive. Now, without, without running a model, we don't know which ones of these

Â things are big. We even don't know if our intuition is right. Well, let's look at

Â class size. So recently it's been like 78 study class size, four of these show a

Â positive coefficient, right, thirteen show a negative coefficient, and 61 show no

Â effect. Right, so this is the result of, somebody did a, a, summary investigation,

Â 78, you know, regression studies, data studies on does class size matter, and

Â what you find is that, you know, only thirteen times does it have that expected

Â negative effect and 61 times it has no effect and four times it actually goes in

Â the wrong direction. So even though we think class size matters when it should

Â matter, smart classes should lead to better performance. It doesn't always work

Â out that way. What about teacher quality? Well there's a recent study by a bunch of

Â economists, right? And they basically show that a good kindergarten teacher is worth

Â $320,000. So if you have twenty students, it turns out that those students can

Â expect to make $16,000 more in lifetime earnings by having a good kindergarten

Â teacher than having a bad kindergarten teacher. So again, by plugging all this

Â data. Now we all expect that class size should matter, lower class sizes should be

Â good, and teacher quality should matter, better teachers should be good. But what

Â you find when you run the data, class size doesn't seem to matter. That much, at

Â least in the ranges of which we're playing, the teacher quality matters a

Â lot. So, what do we learn from all this? What we learn is, it's a lot of data out

Â there. One thing you can do is you can fit that data to linear models. What linear

Â models will do is they'll explain some percentage of the variation. Maybe a lot,

Â maybe a little. These linear models will also tell us the sign and magnitude of

Â coefficients. So it'll tell us whether a variable. It's got a positive effect but

Â it's got a negative effect. And also tell a sort of how big that effect is, and that

Â allows us to make policy choices. You know, investing in things like teacher

Â quality as opposed to class size because they have a larger effect. This is what I

Â call big coefficient thinking. Thank you.

Â