So we now have all of the data, we need to run the regressions From Hicks and Sour Table one. So let's start doing that,now to run a regression, we need a package that will allow us to do it. We can use stats model to do that, and this allows us to write down a formula for a regression. So this formula here says create an output which we're going to call win, obp underscore L M L M stands for linear model. And this is a linear regression, and in particular form of linear regression called ordinary least squares, which we've encountered before. It's the workhorse regression model in statistics and the syntax for writing a regression down. He involves writing formula equals and then the formula we have, we start with the Y variable. The thing we're trying to explain which here is win percentage, then you have a tilde and then you have the list of your ex variables. Which are written down as a sum of variable names, so obp four here and obp against will enable us to reproduce the first Column of Table one. When we write summary that will produce for us a summary output of our data. And if we run that we now get our regression output again, we've seen these before in the previous mook here. We want to focus on the coefficients from the regression from our ordinary least squares regression and compare these two column one of hicks and sour. And if you recall for hikes and sour, the coefficient for on obp four was indeed very close to this value here. 3.3009 if we go back up to the top and look at what the regression coefficient here was, it's not exactly the same. It's actually 3.294, but 3.009 is very, very close indeed, likewise, you can see the second value here, on base percentage against in column one is -3.317. If we look at our regression down below, we can see that the coefficient is -3.315 which again is extremely close, not quite identical. But this will see that the pattern of all of these regressions is that the numbers are almost identical. If we look at the second column we can generate the regression for the second column with the slugging percentage for and slugging percentage against. Again you can go up and compare these regression coefficients and see that they look very similar, will do that more fully in a moment. Then we've got on base percentage and slugging combined in the same regression, this is column three of the hacks and sour model. And then finally we have all four variables included, but now the restriction that the on base percentage for an on base percentage against should have equal. And opposite signs and also that slugging for and slugging against have equal and opposite signs. That's accomplished in python in this way using this package we put an i and then a parenthesis and obp four minus OPP against. And that will tell the python to restrict the coefficients to be equal for these two variables. So that restriction is easily imposed and we can run that regression here. Now you can run back and forth between table one at the top of this notebook and each of these regressions here, to see whether they look the same. But that's I guess slightly inconvenient, so what we'd really like to do is put all of these coefficients from each of these four regressions. Into a single table, looking like table one and we can in fact do that, and there's a stats model contains this command summary call. And what that does is it enables you to combine the regression coefficients from several different regressions into a single table. And the way that this works here is we call define something table one as the summary call, which consists of these four different aggressions. These are the names of the four regressions that we've just created, a nice thing about this is we can also specify the order of the regression coefficients. Which we want to do here so that it looks just like the hates and sour table, it's the same order as they are in the hates and sour table and we can also add the header. And just for tidiness, the header in the hates and sour table one is just 1, 2, 3, 4 so we can add this head of here which we defined above as 1, 2, 3, 4. And so when we run this, we generate this table here and now you can go back and compare that with table one of exercises our directly. And you can see that those coefficients are almost identical, so to conclude what we've done here is to reproduce table one using the same kind of data that hates and sour. Using and shown that we can reproduce their results almost exactly, now you might be asking the question is why these are exactly the same. Why there's slight discrepancies and there are two possible explanations, one explanation is when you run regressions on different packages. And almost certainly hates and sour using a different package when they were running their regressions mac in 2006. When you do that, the numbers often may have some very small discrepancies small enough to be unimportant. But nonetheless, that can often make a slight difference, although the numbers here are slightly, the differences are slightly bigger than that. So that's probably not the explanation, the most likely explanation of this is that since the data that makes. Our running on was taken from something like retro sheet back in 2006, but that data has been updated since then. Because there are often errors in recording and baseball statistician's like to be accurate and when mistakes come to their attention, they correct the data. And so that the data that is available today is actually somewhat more precise, has been updated relative to the data that was available in 2006. And that's likely to explain the small discrepancy we see, but the big message to take away from this unit is that these results are pretty much identical. We have been able to reproduce the hacks of sour result and the hacks and sours result tells us, as they claim. That on base percentage plays a significantly larger role in determining the winning percentage of teams than does slugging percentage. And this is for the seasons prior to the publication of Moneyball, so when it comes to thinking about the remuneration of individual players. Since on base percentage is more important to winning than slugging percentage. Then the logic is that on base percentage should be more significant in determining the salaries of players than slugging percentage. And what we're going to do next is to look at the determination of salaries of individual players, and see whether that relationship really does hold true.