0:00

In this lesson,

we will illustrate why certain estimates minimize certain loss functions.

You work at a car dealership.

Your boss wants to know how many cars the dealership will sell per month.

An analyst who has worked with past data from your company provided you

a distribution that shows the probability of number of cars the dealership

will sell per month.

In Bayesian lingo, this is called the posterior distribution.

Here's a dot plot of that posterior.

Also marked on the plot are the mean, median and the mode of the distribution.

Your boss doesn't know any Bayesian statistics though, so he wants you to

report a single number for the number of cars the dealership will sell per month.

Suppose your single guess is 30.

We'll call this g in the following calculations.

If your loss function is l 0, that is a 01 loss, then you lose a point for

each value in your posterior that differs from your guess and

don't lose any points for values that exactly equal your guess.

Let's calculate what the total loss would be if your guess is 30.

Here, the values in the posterior distribution sorted in descending order.

The first value is four which is not equal to your guess or 30 so

the loss for that value is 1.

The second value is 19 also not equal to your guess of 30 and

the loss for that value is also 1.

The third value is 20 also not equal to your guess of 30 and

the loss for this value is also 1.

There's only 1, 30 in your posterior and the loss for

this value is 0 since it's equal to your guess.

The remaining values in the posterior are all different than 30 hence,

the loss for them are all ones as well.

To find the total loss, we simply sum over these individual losses in the posterior

distribution with 51 observations where only one of them equals our guess and

the remainder are different.

Hence, the total loss is 50.

Here's a visualization of the posterior distribution along with the 0-1 loss

calculated for

a series of possible guesses within the range of the posterior distribution.

To create this visualization of the loss function we went

through the process we described earlier for a guess of 30 for

all guesses considered, and we recorded the total loss.

We can see that the loss function has the lowest value when X, our guess,

is equal to the most frequent observation in the posterior.

Hence, L0 is minimize at the mode of the posterior which means that the best

point estimate if using the 0 win loss is the mode of the posterior.

2:44

Let's consider another loss function.

If your loss function is L1, that is linear loss, then the total loss for

a guess is the sum of the absolute values of the difference between that guess and

each value in the posterior.

We can once again calculate the total loss under L1 if your guess is 30.

Here are the values in the posterior distribution again sorted in

ascending order.

The first value is 4, and

the absolute value of the difference between 4 and 30 is 26.

The second value is 19, and

the absolute value of the difference between 19 and 30 is 11.

The third value is 20 and the absolute value of the difference between 20 and

30 is 10.

There's only one 30 in your posterior and the loss for

this value is 0 since it's equal to your guess.

The remaining value in the posterior are all different than 30 hence their

losses are different than 0.

To find the total loss we again simply sum over these individual losses,

and the total comes out to 346.

Here's again a visualization of the posterior distribution

along with a linear loss function calculated for

a series of possible guesses within the range of the posterior distribution.

To create this visualization of the loss function again we went

through the same process we described earlier for all of the guesses considered.

This time, the function has the lowest value when X is equal to

the median of the posterior.

Hence, L1 is minimized at the median of the posterior one other loss function.

If your loss function is L2, that is a squared loss, then the total loss for

a guess is the sum of the squared differences between that guess and

each value in the posterior.

We can once again calculate the total loss under L2 if your guess is 30.

We have the posterior distribution again, sorted in ascending order.

The first value is 4, and the squared difference between 4 and 30 is 676.

The second value is 19 the square of the difference between 19 and 30 is 121.

The third value is 20, and

the square difference between 20 and 30 is 100.

There's only 1 30 in your posterior, and the loss for

this value is 0 since it's equal to your guess.

The remaining values in the posterior are again all different than 30,

hence their losses are all different than 0.

To find the total loss, we simply sum over these individual losses again and

the total loss comes out to 3,732.

We have the visualization of the posterior distribution.

Again, this time along with the squared loss function calculated for a possible

serious of possible guesses within the range of the posterior distribution.

Creating this visualization had the same steps.

Go through the same process described earlier for a guess of 30,

for all guesses considered, and record the total loss.

This time,

the function has the lowest value when X is equal to the mean of the posterior.

Hence, L2 is minimized at the mean of the posterior distribution.

In summary, in this lesson we illustrated that the 0, 1 loss,

L0 is minimized at the mode of the posterior distribution.

The linear loss L1 is minimized at the median of the posterior distribution and

the squared loss L2 is minimized at the mean of the posterior distribution.

Going back to the original question.

The point estimate to report to your boss about the number of cars the dealership

will sell per month depends on your loss function.

In any case, you would choose to report the estimate that minimizes the loss.