[MUSIC] This module deals with ways in which we have to evaluate how good new tests are. So what I mean by that is we might have a test, say a test for breast cancer, or a test for heart failure. That is the gold standard, what we would call the Reference test. And then we have a new test. The New test might be the same test as the old test, but adding genetic information. It could be an entirely new blood test. And we would want to know, does the new test outperform the old test? Does a chest x-ray for diagnosis of heart failure, which is the standard is their added value to measuring Serum BNP. So we would want to do a test like that. If we wanted to have a prediction of who in the population was going to have a heart attack in the next ten years, there are clinical predictors of that. This is a pretty standard problem in Cardiology and a standard problem in other areas of medicine using clinical factors to predict the development of disease. How about if you add genetics to those clinical predictors? Do you do better? And this is the way in which we evaluate those kinds of added value tests in Clinical Medicine and in Personalized Medicine. So the reference test and the new test will have both True positives and True negatives. True positives are tests where the new test is positive and the reference test is positive, the gold standard. And True negatives where they're both negative. The problem is that a new test may not be the same as the reference test, because the new test might have a false positive rate. We call that a Type I error. And we generally don't want that to be more than 0.05 in statistical terms. But a new test will have a false positive rate determined by this two by two table. The new test will also have a false negative rate. That is, it will fail to detect cases that the gold standard or reference test to text. So we have a number of ways in which we would evaluate a new test compare to an old test. And these are pretty standard definitions that I'd just put on them this in subsequent slides. One is Sensitivity, that's the true positives divided by the true positives plus the false negatives. In other words adding up the true positives and the false negatives below them in the column, that's the total. And the true positive is the numerator, that's the sensitivity. The Specificity is the same thing but with the other column. The true negative rate divided by true negative plus false positive. Now positive predicted value looks across the table instead of down the columns. True positive rate divided by true positive plus false positive is the positive predictive value. And the negative predictive value is a true negative divided by the true negative plus the false negative. Again, across the rows rather than down the columns, in this particular table. So here is an example. We have a test that has been evaluated in 750 patients. The gold standard says, makes a positive test in 700 of those 750, and a negative test in 50. So we already know that the gold standard is being applied. The population in which this test is being applied is a population that has a lot of this disease, whatever it is that we're studying. because it's very prevalent, the positive value 700 out of 750. The new test finds a positive test 640 times and a negative test 110 times. So the new test has a sensitivity of 600 out of 700, or 85%. Does a pretty good job of finding them, but a specificity of 10 out of 50 or 20%. So, a new test is not very specific. If the new test says the disease is there, the disease might or might not be there. The positive predictive value of the new test is very good, 95% and the negative predictive value, not so good, 10 out of 110 or 9%. Now there's a relationship between the probability that a test will find true positives and a probability of finding false positives depending on the underlying disease frequencies and the underlying population structure. So if we have a new test and we plot the relationship between the probability of finding false positives and the probability of finding true positives, that's what we call a Receiver Operating Curve. A test that is totally like a coin flip is shown on the diagonal line on this plot. So if you have a test where the relationship between one minus specificity versus sensitivity is a straight line like that, the test is no better than flipping a coin. And the area under the curve is the way to measure that. And we measure the area under the curve. The area of the entire plot is 1.0, so the area under this curve is 0.5. That's the area below the diagonal line on this side. Here is a great test. Here's a test where the area under the curve is almost a 100%. We almost never see this but a really, really great test would have an AUC or Area Under the Curve of 0.95. In this particular ROC or Receiver Operating Curve analysis. Here's a more likely result. So we have a test where we have, for example evaluated the impact of adding genetic variation to clinical variation. And we now have a test that has a receiver operating curve with an area under the curve of 0.7. So it's better than flipping a coin and how do you put that into context? Well one way to thinking about it is that if you want to find 80% true positives, the really great test will do that at a very, very low price. You won't find very many false negatives. The not so good test you'll get to 80% sensitivity, but the price you pay is a much higher chance of false positives, one minus Specificity. And it's pretty common in this kind of testing to have AUCs of 0.7, 0.7 is pretty good, 0.8, 0.9 are very good and 0.95 is really so good that you have either a new test or you have a mistake in the experiment. One of those two things that happens. So I'm going to close this particular module by just giving you an example of an old test, a reference test, and a new test. And asking you to calculate the sensitivity, the specificity, the positive predictive value, and the negative predictive value for these 800 patients who have been evaluated in this particular way. >> [APPLAUSE]