0:10

So in binary classification, you're usually predicting one of two categories.

So, again, you might be predicting whether

someone's alive or dead, or sick or healthy.

You might also be predicting whether they will click

on an ad or they won't click on that ad.

But your predictions often come out to be quantitative.

In other words almost all the modeling algorithms that we have won't

just assign you to one class or the other, they might give you

say a probability of being alive, or a prediction on a scale of

1 to 10 about whether you will click on the ad or not.

The cut iff you choose gives different results.

In other words, if we predict a probability

that you're going to be alive is 0.7 and we

say all the people above 0.5 will be

assigned to be alive, then that's one prediction algorithm.

Alternatively if you say only people with a probability

of about 28 will be predicted to be alive.

Then that will give you a different prediction for that same person.

And the different cut offs will have different properties, they will be better

at finding people that are alive or better at finding people that are dead.

So if people use what's called an ROC curve.

And so the idea is on the y axis what

they usually plot is one minus the specificity in other words

the probability of being a false positive and on the y

axis they plot the sensitivity over the probability at true positive.

And so what they're trying to do then is make a curve

where every single point along this curve corresponds to exactly one cut off.

So for a particular cutoff, you get a certain

probability of being a false positive or a certain

specificity, and that's this point, one minus the specificity

is this point right here on this x axis.

At the same time, you get a certain

sensitivity, and that's this point here, on the Y-axis.

And so, you can use these ROC curves to

define, whether an algorithm is good or bad by plotting

a different point for every single, cutoff that you

might choose, and then plotting a curve through those points.

So this is an example of a couple of, algorithms that are used to predict.

So these are examples of what real ROC curves look like.

So, for example, if you want to look at, say, the NetChop algorithm.

And you look at say 1 minus the specificity is equal

to 0.2 in other words this specificity is quite high, it's 0.8.

You can trace that number up and you can get about here on the curve

the red curve in this case and that

corresponds to a sensitivity to only about 0.4.

So it's very highly specific but not very sensitive.

And similarly if you move out here to the right on the Y X axis.

You'll get less and less specificity, and more and more sensitivity.

And so this curve tells you a little bit about the tradeoffs.

Now although the curve tells you a little bit about the tradeoffs,

you may actually want to know which of these prediction algorithms is better.

And one way that people use to quantify one curve versus

the other, is they basically calculate the area under the curve.

In other words, they follow, say, the red curve

here, and they trace the entire area underneath that curve,

so that'd be this area down here, and so

that area quantifies how good, that particular prediction algorithm is.

So, the higher the area the better the predictor is, and some

standard, there are some standard values that make sense to pay attention to.

So, if the area under the curve or AUC is

equal to 0.5 that's equivalent to a prediction algorithm that's just

on the 45 degree line and that's equivalent to basically randomly

guessing whether you're going to be a true positive or false positive.

So, 0.5 is actually quite bad, anything less

than 0.5 is actually worse than random guessing.

So, it's worse than flipping a coin.

AUC of 1 is a perfect classifier, in other words you get perfect sensitivity and

specificity for, for a certain value of the

prediction algorithm, and so that's the perfect classifier.

In general, it depends on the field, and it depends on the

problem you are looking at, people often think of an AUC above 0.8.

Could be considered to be something that's sort

of a good AUC of a particular prediction algorithm.

In general something to pay attention to is what, how did you just look at

a ROC curve and decide whether it's a good ROC curve or a bad ROC curve.

So remember the 45 degree line corresponds to just guessing.

In other words, the sensitivity and the specificity

match each other on this predictive o line.

4:11

So, a perfect classifier goes at one minus the specificity.

So, one minus the specificity of zero means perfect specificity and

it jumps straight away up to perfect sensitivity and then straight over.

So that curve represent a perfect classifier

when it goes straight up to this corner.

So the further you are towards the upper left

hand corner of the plot the better the ROC is.

And the farther you are down to this bottom right

hand corner of the plot the worse the ROC is.

And so points along the, the, graphs along the

forty five degree line or below are considered pretty bad.

And you want your curve to look, to go as straight up towards this

top left hand corner and over as over as much as you possibly can.

So that's how you interpret an ROC curve, which will be used later when

we're picking out particular values of, predictors

of binary outputs that output quantitative numbers.