Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

Do curso por University of Houston System

Math behind Moneyball

40 classificações

At Coursera, you will find the best lectures in the world. Here are some of our personalized recommendations for you

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Na lição

Module 9

You will learn how to rate NASCAR drivers and get an introduction to sports betting concepts such as the Money line, Props Bets, and evaluation of gambling betting systems.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, how can we rate teams based on wins and losses?

Now, why is this important for rating teams,

rating chess players, rating tennis players?

Well for one thing, the BCS require that ratings be based on just wins and losses.

because they didn't want to reward teams for running up the score.

Okay, so let's take a look at the 2013 NFL season, and

the Seahawks won the Superbowl as you know.

And we have the data through the playoffs, through playoffs before the Superbowl.

And we want to lets say, predict the chance that Seattle or

Denver would win the Super Bowl.

Okay. So what are the steps we go through,

it's going to use the method of maximum likelihood,

that we discussed when we talked about Dwight Howard shooting free throws?

[SOUND] So I've outlined the steps here.

Step one, is you have a rating for each team, and you have a home edge hypothesis.

So that's what's in orange.

And then the average rating should be zero,

just like when we talked about our first rating model.

The average of the team ratings had to be zero.

For every team that's good there's gotta be a team that's bad,

sort of to cancel that out.

Okay, next step.

Figure out who won each game.

So you do an if stick.

Okay, if the home points is greater than the away points, the home team won,

otherwise zero.

Okay, I know there were no ties this year so I don't worry about ties.

I should probably put it in a 0.5,

if the home points equals the away points you put in a 0.5 for a tie.

And then you just find the ties.

And then for each tie you consider it as one game the home team won,

one game the home team lost.

Now what you do is compute a score for each home team.

Okay? And the higher the score,

the higher the chance they're going to win the game.

And so the score is simply the home team rating.

Minus the away team rating plus the home edge.

So right here you take the home edge here, then you look up the home team rating,

in this case for Denver, and look up the away team rating,

in this case for Ravens, and that's the home score there.

Okay.

Okay. So now.

How do you take a home score and make it into a probability?

Okay, this comes from Daniel McFadden, popularized this, won the Nobel Prize for

economics, for his emphasis on what he called, Discreet Choice,

which is a form of logistic regression.

And the marketing analytic's book has a chapter on this.

But Fadden found out and he was talking about market share like his

research was on what fraction of commuters in San Francisco,

that would take the Bay Area rapid transit as opposed to a car.

As you change the price and you change the quality of the Bay Area

rapid transit system, and he was very right on,

in predicting the percentage of riders that would take the Bart.

Okay, again the marketing analytic's book has much more detailed on the screen

choice, logistic progression.

But this is the logistic equation, okay?

And so basically, the chance that the home team wins would be E to the score,

divided by one plus E to the score, and that's called a logistic function.

Okay, so what we're going to do in the probably the home team wins,

it's E to the score, we don't know what these numbers should be yet,

divided by one plus E to the the score.

And probably the away team wins is one minus that.

Okay then we want to do log likelihood.

So if the home team won, we'd take the log of the home team probability of.

If the away team won, we take the log of the probability the away team won.

Okay?

And so then we add up these log likelihoods and maximize that.

So the log likelihood here.

I used an IF error.

Okay, if the home team won, I would take the log of the Home team wons.

Otherwise take, be a log that probably away teams wins,

and there's some rows here that have these stupid NA's, and so

I put a one in there, so the lock of one is zero its not going to effect anything.

So I add up all these log probabilities, and that's what I want to maximize.

Okay so we're going to change the orange to maximize the yellow.

Okay so let's change these.

Let's suppose there's no home edge which is wrong, and

I can suppose that every team has a rating of one.

Okay.

So now all you gotta do, Okay,

we'll reset all here to put this in.

You want to maximize the log likelihood.

You want to change the ratings, and the home edge.

See I screwed up there I did it in advance There's the home code.

Put that there, and add that the average rating is zero.

Now we need the GRG non-linear.

We don't check the non negative because we want to allow negatives.

And it should work out okay.

Whoops, okay.

I've check the site, values don't converge.

Lets take a look at that.

Okay, we want to maximize the lot, oops, wrong cell.

I want to maximize the log likelihood is end six.

Okay.

Now, again, I screwed up there again, because I don't think I checked, okay,

Let's put a small number here.

I think when I started with that add one there.

Okay now it sort of nested up starting with that horrible home edge.

It could lessen to [INAUDIBLE].

Okay it looks like it got it right.

Denver has got a rating of 1.81, Seattle of 2.51.

Okay, and the home edge doesn't really matter too much there.

Let's figure out the chance Seattle would win the Super Bowl.

Okay, we found out Seattle's scorer,

which there's no home edge in the Super Bowl in this case.

I would take Seattle's rating,

minus Denver's rating.

And then, again, E to the score,

divided by one plus E to the score.

We need a parenthesis there, okay.

So then I get a 67% chance that Seattle would win the Super Bowl,

[INAUDIBLE] with the win loss system.

Okay, again what do you do with a tie game?

Okay, count as one win for the home team, one win for the away team.

What about an undefeated team?

Like you could have Harvard in college football go undefeated in the Ivy League,

no offense they're probably not the best team.

So you would add a fictional team.

Every team goes one and one against them.

Okay, and then run the model based on that.

Okay, so that's how you can do ratings based on wins and losses.

In the next video, we'll talk about, if you've got power ratings for

the teams based on our original point spread model, how would you estimate going

to the NFL playoffs the chance of each team winning the NFL playoffs.

It's a little tricky because the teams get re-seeded right after each round.

So like if a high seed loses, then basically a lower seed can move up and

maybe get a home game they didn't expect.

O Coursera proporciona acesso universal à melhor educação do mundo fazendo parcerias com as melhores universidades e organizações para oferecer cursos on-line.