Welcome to the Moneyball MOOC. In this MOOC what we're going to do is look at Moneyball, perhaps the most famous story in the whole of sports analytics. We're going to start off by reproducing some of the results which underpin the Moneyball story, and then we're going to extend the analysis, extend our data skills in order to look at some of the more up-to-date ways of thinking about baseball, sports analytics and performance. Let's start with the Moneyball story. Many of you may have read the book and seen the film, but I just want to recap briefly what the Moneyball story is before we get into analyzing some of the data. The Moneyball story revolves around the Oakland A's and Billy Beane. Billy Beane was a baseball player who became general manager of the Oakland A's in 1997 and rapidly achieved significant success with the team. What was notable about the Oakland A's is they were cash-strapped team. They didn't have much money to spend on players and baseball is a sport which traditionally teams have required a lot of money in order to be successful, so famously, the New York Yankees are the team that's always paid the highest salaries and they have been by far away the most successful team in baseball history. The Oakland A's didn't have much money, but Billy Beane seemed to discover a way to be successful without a lot of money and that relied heavily on data analysis. Certainly, in the early part of his career that success show through very shortly. In between 2000 and 2003, the Oakland A's made it through to the playoffs in consecutive seasons, in four seasons in a row. In 2003, they achieved a remarkable feat. They had a streak of 20 winning games, a performance which in the American League hadn't happened for more than a century. Clearly, something was up and people started to get interested. In particular, Michael Lewis the writer and investigative journalist, he got very interested in what explained the capacity of Billy Beane to be successful and that's what led to him writing the Moneyball book which was published in 2003. Perhaps the most famous story in the book, the most famous story of Moneyball is the story of on-base percentage that we're going to look at in this MOOC. The traditional story of the way in which teams, scouts, general managers, picked out batters to hire for the team was to look at their hitting ability measured by statistic which is basically measures the power of your hitting capacity. What that statistic doesn't include is the capacity to draw a walk, to actually not hit the ball but induce the pitcher to throw four balls which you don't swing at so that you automatically get to walk to first base. Now, getting to first base is a valuable skill. It doesn't involve a hit and on-base percentage includes that capacity in the statistic. Now, what Billy Beane's thought and his statistical team at the Oakland A's thought was that this capacity which is represented in on-base percentage is a significant one, the capacity to draw a walk. He started investing in players who were good and had the skill, and with the idea that these players were undivided. Other scouts didn't value this skill very highly so these players were relatively inexpensive to hire. Thus, with a limited budget Billy Beane and the Oakland A's were able to hire players who were better than average and this was how they could be successful. In other words, and the subtitle of the book was, The Art of Winning An Unfair Game. Michael Lewis suggested in the book that the one way in which Billy Beane and the Oakland A's did that was to hire players who were undervalued by other teams, but who were actually significantly capable of increasing your chances of winning and that's the basic story that we're going to investigate in this MOOC. In particular, it's not just the story but it's the statistical underpinning for that story that we want to look at. A couple of years after the publication of the book in 2006, a paper was published by Jahn Hakes and Skip Sauer to economists who were interested in this idea that there was a skill that was undervalued prior to the publication of Moneyball, which enabled Billy Beane and the Oakland A's to be successful. Therefore, once Moneyball was published and became a big bestseller, that everybody should have recognized that the skill was valuable and that should have lead to an increase in his valuation in the marketplace. They wanted to test whether that claim had a statistical foundation, whether they could prove statistically that this was true. In the paper that they wrote they showed that this was indeed the case, they showed firstly, that on-base percentage was important in determining winning. Secondly, that it appeared to have been undervalued prior to the publication of Moneyball, and that its valuation changed on the publication of Moneyball so once people recognized that this skill was valuable. Those two claims can all be represented by two tables in the paper that they wrote, Tables 1 and Table 3. What we're going to do is use data and programming to reproduce those tables, to show the Moneyball story as presented by Hakes and Sauer. This week we're going to focus on Table 1.