0:00

[MUSIC]

American Cancer Society estimates that about 1.7% of women have breast cancer.

Susan G Komen for the Cure Foundation states that mammography

correctly identifies about 78% of women who truly have breast cancer.

An article published in 2003 suggests that up

to 10% of all mammograms are false positive.

These probabilities are of course estimates,

as they're very difficult to calculate pro, precisely.

But we're going to take these as givens for this example.

As usual, let's first parse through the percentages we're given.

The probability of having breast cancer is estimated to be.017.

The probability of testing positive given that somebody

has breast cancer is .78 and the probability

of testing positive even though somebody does not have breast cancer is .10.

1:05

Prior to any testing, and any information exchange between the patient and the

doctor, what probability should a doctor assign

to a female patient having breast cancer?

Since we don't know anything about this patient's medical history, our best

bet is to treat them like a randomly chosen person from the population.

Hence, we would set this probability at 0.017.

This is the prior probability we assigned to a

patient having breast cancer before we collect data on them.

In other words, before we test them.

1:39

When a patient goes through breast cancer screening, there are 2 competing claims.

Patient has cancer or, and patient doesn't have cancer.

If a mammogram yields a positive

result what is the probability that patient has cancer?

If we think about this in probability notation, we're being asked to find

what is the probability of breast cancer given that the patient tested positive.

And earlier on, we were provided the, the

reverse of this probability Positive, given breast cancer.

And when we have the conditions reversed, we know that a probability

tree might be useful in our calculations. So, let's start building that.

There are two competing claims.

Patient has breast cancer, or patient does not have breast cancer.

The probability of having breast cancer a prior is 0.017 So the probability of not

having breast cancer is going to be the compliment of that, 1 minus 0.017, 0.983.

If a patient has

breast cancer, there are 2 possible outcomes; they

might test positive, or they might test negative.

The probability of testing positive when the patient has breast cancer is 0.78.

The probability then of testing negative when the patient has

breast cancer is going to be the complement of 0.78, 0.22.

Similarly, when a patient does not have breast cancer,

there are two possible outcomes, testing positive or negative.

Probability of testing positive even though the patient does not have breast

cancer is the .10 we were given earlier so the accuracy r, rate of the test

when the patient does not have breast cancer or in other words the probability

of testing negative given no breast cancer is going to be the compliment of .10 0.90.

We know that we're given that the patient has tested positive, so really,

we're only interested in the first and the third branch here, so we don't

even have to worry about the other branches, because we know the patient

we're interested in doesn't come from the

sect of the population that tested negative.

3:49

First, we're going to want to find the joint probabilities.

The probability of breast cancer and positive is the product of 0.017 and

0.78, that's 0.01326.

And the probability of no breast cancer and, and positive is a

product of the probabilities in the two branches leading up to that.

0.0983.

We are asked for the probability of breast cancer given positive.

Using base theorem, this is going to

be the probability of breast cancer and positive,

divided by the probability of testing positive.

The numerator is simply coming from our top branch.

And the denominator.

The probability of testing positive is going to be the sum

of breast cancer and positive or no breast cancer and positive.

Since these are two disjoint outcomes.

We add the two probabilities when we're saying or.

This gives

us about a 12% rate.

This, remember, is what we called our posterior probability.

Initially, we had given a 1.7% chance to this patient having breast

cancer because we knew nothing about them. Then, we tested them.

And they tested positive.

So, now we have this additional information about the patient.

Now, after data

collection, the probability that we're assigning to

this patient having breast cancer is slightly higher.

It's at 12%.

5:39

Once again we run through our probability tree:

two competing claims, breast cancer, and no breast cancer.

However what has changed now is that this

patient is no longer a nobody from the population.

We've tested them once and they tested positive.

So we have some additional information about them, and

we should update our prior with this additional information.

In other words we plug in the posterior from the previous iteration,

the previous test. To be our new prior.

And therefore, the probability of not having breast cancer is updated

to be the complement of this, 88%. Next, we can run through our tree again.

Remember, nothing about the test has changed.

So the probability of testing positive given

a patient has breast cancer is still 78%.

And the probability of testing

negative given the patient has breast cancer is still 22%.

Similarly with the lower branch, nothing about the

test has changed so the conditional probabilities of testing

positive or negative given the patient has, does not

have breast cancer is still 10% and 90% respectively.

Once again, we're only interested interested in the branches where

the patient has tested positive Because we're saying that the second

mammogram also yielded a positive result, we can

multiply through the branches to find our joint probabilities.

And these are going to change a little bit, because

our starting probabilities, the probabilities

in the first branch has changed.

And this time, our probability of having breast cancer and testing positive Is

higher at 0.0936. And our probability of no breast

cancer in positive is 0.088.

In this example, we have reviewed

a Bayesian approach to statistical inference.

Which involves setting a prior, collecting data, obtaining a posterior,

and updating the prior with the posterior from the previous step.

In addition we got some practice working with conditional

probabilities, probability trees, and the base theorem in general.

[MUSIC]