[SOUND] In the previous lesson, we learned how to calculate mean and median, which represents the middle point of our data. We also mentioned that we often think this summary statistics as a typical value. For example, when you take your car for an oil change and are told that the average time to get this done is 30 minutes. Do you expect to be done in 20 minutes, 30 minutes, 40 minutes? How would you assess your chances of taking your car in and out within 30 minutes. This is where some measure of how dispersed your data is can be helpful. Measures of dispersion, also known as variation, tell us how spread out or compact the data tends to be. There are three main measures of variation, the range, the variance and the standard deviation. In this lecture we will cover the three basic measures of dispersion. The range is simply the largest observation minus the smallest observation. This can explain quickly how widespread your data is. Consider two emergency rooms. One measure of operation effectiveness is the time it takes to process the patients. For these two ERs, we have taken some observations and find the average time waiting before admission by patient is about the same 5 minutes. But in which of these two ERs would a patient experience a waiting time closer to this average? To answers this question, we need to study the variability in our data and look at the data from A shows that the patients waited as little as 4 minutes and as much as 6 minutes. So the range of the data was 2 minutes. For B, the range is 6 minutes. That is 8 minutes minus the 2 minutes. So when comparing these two ERs, we can say while the average is the same for both. The patients using B experience more variability and for them, the average is less typical, compare to the experiences off patients to go to A. While range is one measure of dispersion, it's not very accurate, specially when the data set gets large. The more commonly used and more meaningful measures of how variable a data set is the standard deviation and variance. Standard deviation is a measure of dispersion of data. Small values of standard deviation will mean that the data points are close to the mean. Standard deviation is calculated by taking the square root of the variance. In addition to expressing variability of the data, the standard deviation is used in more advanced topics of statistics which we will cover later on in this course. The variance is computed differently for population data than it is for sample data. For population data, each observation is subtracted from the population mean and then resulting values squared. Once all values are computed, the numbers are summed and then divided by the population size uppercase N. The resulting value is represented by the Greek letter sigma, with the squared symbol after it. This is called sigma squared. For sample data, each observation is subtracted from the sample mean and then resulting value squared. Once all these values are computed, the numbers are summed. And then divided not by the sample size, but rather by n-1. The resulting value is represented by the English letter s squared. The sample variance, s squared, is a point estimate for the population variance sigma squared. Regardless of how the variance is computed, the standard deviation is always simply the positive square root of the variance. The population standard deviation is the square root of the population variance and is represented by the Greek letter sigma. The sample standard deviation is the square root of the sample variance and is represented by the English letter s. Recall what I said about notations earlier. Population parameters are always in Greek and capital letters, and notation for sample statistics is in lowercase and English. As I have mentioned, we would not be calculating these values manually, and instead, we will use a software like Excel because our focus will be on large data sets and not small and rather trivial problems. However, I would like to show you to an example how one would calculate this. The goal is to understand what the functions in software you will use is doing. Then we can focus on their meanings. Recall the example we use for calculating mean and median, now let's use it for calculating variance and standard deviation. In this example, we will treat this group of friends as a sample of student population who graduated with them from their college. In the first case, there are 10 friends, that's your small n, and the mean salary is $65,000, and that is x bar. So now we can put each individual datapoints in X-sub I. The first datapoint is 57,000 and second is 58,000 and so on, with the last one being 80,000. We squared each difference so that when we sum the difference, the positives and negatives won't cancel each other out. Then divide the sum total by n minus one, which is nine in this case. By taking the square root of the variance, we will find the standard deviation, and in this case, this is approximately $6,912. What happens to the standard deviation when the next person with the $8 million salary joins the group? Now there are 11 friends that's your n and the mean salary now is $786,363 and that is your X bar. So now we can put each individual data point in X sub I and just like before calculate the variance. By taking the square root of variance, we will find the standard deviation. And in this case, this is more than $2 million, which is much larger than the standard deviation of $6,912. So, if you only reported average salary for this case, people might think that the value is the typical value for these set of friends. But if you look at the variability of the salaries, you would know that the reported mean salary is far from typical value for the group members. Remember to watch the video illustrations where I show you how to use Excel to calculate these statistics. So now let's practice. Given the two histograms, which would have a mean, that is a better representation of what one might observe if a random observation from the sample was selected and why? The answer is A. Both histograms show central tendencies near value of 5. And both have observations with values between 0 and 10. However, histogram A has more observations clustered around its center as compared to histogram B which has observations that are spread out. This means that the distribution observation for sample A has a smaller standard deviation than sample B. To use averages to pass information about the data set, without expressing the variability within the data set, which is most often expressed by standard deviation, is very incomplete. Knowing the average as a single summary point for an entire data set has little value if the average is not a very good representative for our data. So always pay attention to both the central tendency of the data as well as the standard deviation when looking at a summarized data.