So in this last section of this lesson I want to say a few words about linear regression, so pretty easy form of regression but we see it quite commonly. Now with the t-text and Anova we were comparing the means of two or more groups, but those groups were categorical groups. The variable was categorical. White cell counts, temperature, or systolic blood pressure, or the value of some blood test. The name, that's categorical. But what if I wanted to compare numerical value to numerical value. So a set of values. A set of numerical continuous data time variables. I want to compare them, so for instance I want to know is there a correlation between two sets of numbers. Examples that I've listed there for the t-test and ANOVA is drug A and placebo or patients in group A and patients in group B, yet we really want to do something else. We want an X value and a Y value and we want to compare whether they are correlated to each other. Now, these data points as I have mentioned, they come in a pair. An independent and a dependent set of values. Now, it's sometimes difficult to know which is the dependent and which is the independent. Example that's used in most statistical textbooks is just outcomes of hours of study. So we look at students, and what the test results were, and look at how many hours they studied. So the independent variable would be number of hours studied, and depending on that, there's the dependent variable that is the test score. So when it comes to the medical literature you really have to see which one is which which variable depends on which other one, which is the dependent and which is the independent variable. Now as I mentioned we will have these pairs of values and we want to know is there a correlation between them. Now, it turns out, we can test the strength of that correlation, how one depends on the other, and we can also test for the direction of that dependence. Have a look at this. First of all, we're gonna look at a positive correlation. Now look at those set of values, you can well imagine the dependent, independent variable on the x-axis. As that increases, the dependent variable on the y-axis changes accordingly as the x-values get larger so do the y-values get larger. You can well imagine the single line that you can draw through there that is the actual correlation, the mathematics that happens behind the statistics. You can draw a line there that will show you as the independent variable on the x-axis increases, so does the dependent variable on the y-axis. We can also have negative correlation. That's where we'll see this kind of pattern as the independent variable increases, the dependent variable decreases. That would be a negative form of correlation and then obviously we get no correlation whatsoever. As the one increases the other one jumps around unpredictably. There is no pattern to this type of correlation. Now the strength and the direction, the mathematics your computer whatever you use to do that calculates what is called the correlation coefficient which is usually named a lowercase r. Now that r takes a variable from negative one all the way to positive one in decimal values. Negative one being absolute negative correlation. Step by step based on the independent variable increases the dependent variable decreases step for step in a straight line. Perfect, perfect, perfect. Positive one, obviously, positive correlation is the independent variable increases step by step, so does the dependent variable. And if you have an r variable of zero, that's the last picture we saw. There's no correlation whatsoever. So that is linear regression.