Sometimes, it is very important to know not only a measure of central tendency, like an average, but also some measure of a deviation of our values from this average. Let us consider an example. Assume that you have two studies and you have ages of participants of these studies. In the first study, age of participants, Is the variable that take the following values. For example, 35, 37, 34, 33 and 36. What is the average age of participants in this study? You can find to the arithmetic average of these numbers and find that the corresponding average equals to 35. Now let us consider a second study. And again, we have age of participants, And these ages are the following 35, 10, 60, 5, 65. You can, again, find the average of this numbers, what is the answer? You see that, the answer is, again, 35. But it is clear that these two studies are in a sense different. This study is aimed on adult persons. Around 35 years, plus or minus several years. And this study is aimed on, not only on adult persons but also on very young on children, and some elderly persons are also included. So there is a clear difference between these two data sets. And this difference cannot be encoded into the measure of central tendency because the average value here and here is the same. Then, we have to use different descriptive tools to distinguish between these two data sets. And the first tool is called Sample Variance. Let us consider a data set that will be denoted by X. And it contains values X1, X2 and so on, X10. All of these things are just numbers and we can consider X bar, which is just a mean of these numbers. Then, a sample variance of X, that we denote by Var, is the following thing. And it is an average of score of deviance of various of X from their average value. So I have to find a sum for i, from one to n xi- xbar, take the square, and divide this sum by n. Due to some statistical reasons that we will discuss later. We can write here instead of n, n-1. In a sense, the variant of this variance formula where n is replaced with n-1 is better than this variant. So this variant is called biased sample variance. And to make it unbiased, I have to consider another definition. Let me denote it by Var+ X, and it is the same formula as here. But I will get n-1 here. This is called unbiased sample variance. As you see, the formula for variance is related to the formula of variance of random variables that we discussed in the course of probability. There is a relation between sample variance and variance of random variables that we probably discuss later. And in the sense of this relation, this formula for variance is better than this one, but I'm not going into details now. Let us consider now another way to indulge this variance. Instead of considering this variance, we can take a square root of it and consider the so called standard deviation. Standard deviation, which is denoted by Std. Usually is just a square root of variance. The advantage of standard deviation over variance is that it uses the same units that X itself. For example, if X is age, and it is measured in years, then X bar is also measured in years. And the score of this difference is measured in score the years, which is rather strange unit. And you cannot, for example, make summation of years and scored years, it doesn't make sense. So sometimes we need some measure of deviance that is measured in the same units as the value itself. And standard deviation is a good candidate for this. Sample variance and sample standard deviation are not the only possible ways to merger this variance. Let us consider another approach. Let us return to an approach that we used when we defined medium. Consider an example, that our data set X contains all the falling numbers. For example 1, 3, 2, 7, 100, 11, 17. First, I have to solve this series of numbers. And get the following series. 1, then 2, then 3 and then 7, then 11, then 17, then 100. In this sort of series, again, as we did previously, I denote the center which is median. Now you see that this median splits the data set into two equal parts this part and this part and I can find a median of each part. Here it is this number 2, and here, it is this number 17. So this thing is called first quartile. Q1, first quartile. And this value is called a third quartile. Median can be also called second quartile. These values first, second and third quartile, divides all data set into four approximately equal parts. This part, this part, and this part. This is why there are called quartiles. Note that the half of our sample is contained between first and third quartiles. So the difference between this third and first quartiles is called inter quartile range, abbreviated by IQL, is the difference between third quartile and first quartile. This value inter quartile range can be used to estimate how far values of our data set deviates from some element in the middle in this case, from median. You see, that if this value is large, much larger than median. And this value is much lower than median. Then it means that the difference is large. And inter quartile range will be large. But if all values here are almost equal to median, here a little bit larger and here are a little bit smaller, then our difference will be small and inter quartile range will be small. Quartiles, define liked median. Not only for numerical data for any order categorical data as well. Because the only thing that we need to find them is an ability to solve these values. And to solve them, we have to compare them, and in order categorical values, so we can compare them too. But to find these inter quartile range, we have to find this difference and it involves arithmetic and it involves some numeric values, because otherwise, we cannot do the subtraction. Anyway, just like median, inter quartile range benefit some kind of robustness. If, by some mistake I will replace this 100 with 10,000s, it will not affect these quartiles at all. But if I consider variance, we see that if I replace one element with much, much, much larger element, it will considerably change the corresponding variants. So quantiles and inter quartile range are, in a sense, more robust tools that can be useful sometimes. For example, if you deal with data with lots of outliers, elements that lie outside of the expected distribution of the data. Now, we discussed measures of central tendency and measures of deviance from the corresponding element in the middle. But sometimes, we have to understand our data in more details, not to get more deep understanding of the data than these values gives us. In this case, we have to study their distributions. And this can be done always, their corresponding visualization technique. Let us discuss them. [MUSIC]