Welcome to week 4 of Moneyball and beyond. This week, we're moving away from the Moneyball metrics we looked at in the first three weeks and now we're going to look at a new concept called Run Expectancy, and this is the basis for a lot of the new statistical work that has gone on in baseball. First, let's fix ideas by talking about why Run Expectancy matters, and how it contributes to our understanding of the game. The statistics we've looked at already, such as On-base percentage and Slugging Percentage, these statistics capture the performance of batters in the game, but they don't really tell us anything about the context in which the events took place. Take an extreme example, a hit which gets you to first base is treated the same whether there was no one on base and the only change in the game is that there's now a runner on first, as opposed to a situation, say for example, where bases were loaded and where you've now scored a Run, and advanced a runner from second to third and another runner from first to second. Clearly, that second situation is more valuable in some sense to the team than the first, and the concept of Run Expectancy is meant to capture that idea. Let's begin by looking at an example to show how this works. If we look at the diagram below, we've got two cases, case a and case b. Case a is the upper panel and case b is the lower panel. What we have compared here is the state of the game before the event that is the batter coming up to bat, and then after the batting event is finished. In case a, before the event, we suppose that there's a runner on second base already and this team with one outside, and in the second case, case b, there's already a runner on third base at the beginning of the event. Now, let's compare the outcome when, in either case, a batter gets a hit and makes it to first base. We can see in both the upper panel and the lower panel that at the end of the event, we've got a runner on first base, that's the red dot that you can see on first base there. But then there's a difference between case a and case b. In case a, the runner on second is advanced to third base so there is now a runner on third base, but no runs have been scored. In case b, there was a runner on third base, but that runner has now made it to home. Although there are now no runners on base apart from the runner on first base, there has been a run scored. Now, we want to take into account the impact that the batter had taking into account the change in the configuration of the game. Now, you might think that case b is clearly more valuable because a run was scored, and that indeed is a valuable outcome in baseball. But we also have to take into account that there is now a runner on third base in case a, but no runner on third in case b, and having a runner on third base is pretty close to scoring a run, it has a high expected value. When you have a runner on third base with one out, that often turns into a run in a game. Once we take into account the likely consequences of the base state at the end of the event, the case a can actually come to see more valuable than you might have thought. Likewise, if we take into account the overall impact of the event, that's the batter coming up to bat and getting a single, we also have to take into account that in case a the situation was less favorable for scoring runs than case b because in case b the runner was on third, whereas in case a, the runner was only on the second. Once we take into account the context of the game before the event and the context of the game after the event, we can make an adjustment for the value contributed by the actual event itself, and that might turn out to look rather different from simply saying, "Well, a Run was scored in case b and wasn't scored in case a." Now, let's look a little bit deeper at this stage into what possible states of the game there are. I've mentioned states of the game but what are the states of the game that can exist? Well, at any point in time in an innings, then there are 24 possible states that a team can be in. Those 24 states combine the runner zone basis and the number of outs that are against the team at that particular point in the inning. What you can see in this diagram, along the vertical axis are the base states and along the horizontal axis are the number of outs. You can see the eight base states, they're listed and we have a code for these base states and the code says really whether there's a runner on a particular base and given there are three basis, there are a combinations of possible base states. The first one in that list 0, 0, 0, that says there's no one on base at all. The second state, 1, 0, 0 means that there's a runner on first base, but no runner on second and no runner on third. The first number relates to first base, the second number relates to second base, the third to third base. Then if we look at the third state there, we can see 0, 1, 0 means no runners on first, a runner on second, and no runner on third, and so on through. You can see ultimately at the bottom there they eighth case bases loaded. There's a runner on first, a runner on second, and a runner on third. In each of these possible base states, there could be no outs, one out or two outs, and those are the columns listed in the matrix. We have a matrix which defines the possible states of the game. In fact go even further than this we can write this down as a single line of code, a single code for each of the 24 possible states. This is conventional in the literature. You can see the first three digits there relate to the base states, first base, second base, third base, and then the fourth digit relates to the number of outs. We can use these codes to look at the different states in the game. One thing we want to do is to measure the value of being in any given base state and then relate that to the effect of any given event. The beginning of an event, there is a value of a base state and at the end of event, there's a number of runs scored and a new value of the base states. That measures the difference between those two, the value at the beginning and the value at the end plus the runs code that tells you the value of that event, the contribution that that event has made to the team that the batter is playing for. We can measure that. How do we measure the value of those states? Well, that's the concept of expectancy. The way we're going to do that is to look at the average values created in an actual season. We're going to take all the events of a particular season of which there are roughly in a baseball season there about 200,000 events in Major League Baseball. Take those 200,000 events and we're going to say, well, in each base state, at the beginning of each event, what were the number of runs scored between the beginning of that state and the end of the inning. That measures the value of that base state. The change in the number of runs from the beginning of the event to the end of the inning. That's what we define as run expectancy. What we can then do is look at the change in run expectancy associated with any particular event. We can give a run's value to a particular event measured by the runs scored plus the change in run expectancy from the beginning of the event to the end of the event. When we do that, we have a great way to measure the contribution of batters in a game and we have a great way to measure many different aspects of the game. We can start to ask, for example, questions about different ballparks and how much they contribute to a game, how much they affect the outcomes of games. We can ask questions about pitches as well and how many runs they concede. This concept of run expectancy becomes a very powerful tool, the measuring contributions made in the game, and trying to analyze in more detail just how games are won in baseball.