I can put a line through the data.

There's nothing stops me running a linear regression.

But what I'm saying now, is that's not totally appropriate here.

And the thing to realize is that the outcome that I'm modeling.

The probability of compromise has to lie between zero and one.

Probabilities must lie between zero and one.

Proportions must lie between zero and one.

So, if I put a line through data that looks like this.

You can see that something odd is going to happen.

Especially, if I extrapolate the line.

So first of all, the line doesn't fit the data too well.

But absolutely if I extrapolate.

If I took an extrapolation out to 12 plugins, for example.

t's going to predict a proportion compromise or

a probability of compromise greater than one.

It's nonsensical, so the underlying issue here.

s that my outcome has to lie within a range 0,1 and unfortunately,

my line doesn't respect that range.

So what am I going to do?

So, this slide shows you that the linear regression isn't necessarily

a smart thing to do.

There are alternatives and the alternative that I would typically use,

would be a logistic regression.

Let me show you what a logistic regression looks like.

So, a logistic regression actually fits on a transform scale.

And what I'm showing you here,

is the back transformed to the original scale of the data.

And so I'm not really going to dig into the details here.

The most important point that I'm making for you.

Is that, if you're looking at dichotomous outcome data like, live and die, buy and

don't buy.

You might find a logistic regression model much much more appropriate.

If we were to fit a logistic model for this data, which I've done here.

It provides a different sort of fit.

This sort of fit, I would often term an s-shaped curve.

In fact, it's a logistic curve to be more precise, and hence,

we call it the logistic regression.

But it has some very good features associated with it.

And the main feature is,

it's never going above one and it can never go beneath zero.

So, it provides a more suitable model, when you're trying to predict outcomes.

That are things like probabilities and

proportions, that should live between zero and one.

So, here's the fit of the logistic regression model, and

once you have got that fit.

You can see how you can use it for prediction.

If I have, for example, four plugins and

I want to predict the probability of site compromise.

I just take four, and I go up to the curve and I read off the value.

And that's what this regression methodology will give me.

It's a prediction methodology that's more suitable for these dichotomous outcomes.