Let's look at a more basic example of how a histogram might be constructed, and then use that as a spring board for talking about additional descriptive statistics that can be generated for quantitative variables. In this example, we have the exam grades of 15 students. We first need to break the range of values into intervals. Also called bins, groups or classes. In this case, since our data set consists of exam scores, it'll make sense to choose intervals that typically correspond to the range of letter grades. So ten points wide, 40 to 50, 50 to 60, etc. By counting how many of the 15 observations fall in each of the intervals, we get this table. To construct the histogram from this table, the intervals are plotted on the X axis and show the number of observations in each interval, or the percentage of observations in each interval on the Y access, which is represented by the height of the bar located above the interval. >> Once the distribution has been displayed graphically as a histogram, we can describe the overall pattern of the distribution and mention any striking deviations from that pattern. More specifically, we should consider the following features. We will get a sense of the overall pattern of the data from the histograms center, spread, and shape, while outliers will highlight deviations from that pattern. >> When describing the shape of a distribution, we should consider symmetry or skewness of the distribution and peakness or modality. That is, the number of peaks or modes that the distribution has. Here, all three distributions would be referred to as symmetric. But they're different in their modality or peakness. The first distribution is unimodal. It has one mode, roughly at 10, around which the observations are concentrated. The second distribution is bimodal. It has two modes, roughly at 10 and 20, around which the observations are concentrated. The third distribution is kind of flat or uniform. The distribution has no modes, or no value around which the observations are concentrated. Instead the observations are roughly uniformly distributed among the different values. [MUSIC] A distribution is called skewed-right. With the right tail, the larger values is much longer than the left tail, or smaller values. Note that in a skewed-right distribution, as you can see here on the right. The bulk of the observations are small to medium, with a few observations that are much larger than the rest. An example of a real life variable that has a skewed-right distribution is salary. Most people earn in the low to medium range of salaries with a few exceptions, such as CEOs, professional athletes, etc. That are distributed along a large range, that is the long tail of higher values. A distribution is called skewed-left if the left tail or smaller values is much longer than the right tail or larger values. Not that in a skewed-left distribution, the bulk of the observations are medium to large, with a few observations that are much smaller than the rest. An example of a real-life variable that has a skewed left distribution is age of death from natural causes. Most deaths from natural causes happen at older ages with fewer cases happening at younger ages. Skewed distributions can also be bimodal. Here's an example, a medium sized neighborhood 24 hour convenience store collected data from 537 customers on the amount of money they spent in a single visit to the store. The histogram displays the data. You can see that the amount of money spent is concentrated around $20, And then concentrated again around $50. From the Mars crater data set, we also display the latitude of the Mars craters rims. The values are concentrated around 66 to 69 decimal degrees north. And again, around 36 decimal degrees north. So the mode or modes of a variable are the values that occur most often. And knowing this can help you make better decisions. The mode, for example, has applications in book publishing. Not surprisingly, it's important for the publisher to print more of the most popular books, because printing different books in equal numbers would cause a shortage of some books and an oversupply of others. Likewise the mode has applications in manufacturing. For example, it's also important to manufacture more of the most popular shoes and shoe sizes. Now as we've seen, the mode is not always at the center. >> The center of distribution is its midpoint, the value that divides distributions so that approximately half the observations take smaller values and approximately half take larger values. >> As you can see from the histogram, the center of the grades distribution is roughly 70. We can get only a rough estimate for the center of the distribution. Seven students scored below 70, and eight students scored above 70. Estimates can often be made from examining a histogram. So what about spread? The spread of the distribution, also called variability, can be described by the approximate range covered by the data. From looking at the histogram, we can approximate the smallest observation, or minimum, and the largest observation, or maximum, and thus approximate the range. In our exam score example, you can see that the approximate minimum is 45, that is the middle of the lowest interval of scores. The approximate maximum is 95, the middle of the highest interval of scores. So, our approximate range is about 50 points. 95 minus 45. The overall pattern of the distribution of the quantitative variable is described by its shape, center, and spread. By inspecting the histogram, we can describe the shape of the distribution, but as we saw, we can only get a rough estimate of the center and spread.