Hello. In this lecture, we're going to take a look at another graphical representation of quantitative data called boxplots. So, what is a boxplot? Well, we're going to begin with our five-number summary. If you remember, the five-number summary gives us the median as a nice resistant measure of center. Then, the two quartiles, Q1 and Q3. These are the 25th and 75th percentiles of our data and so the distance between them gives us that middle 50 percent for how spread out the middle 50 percent of our data is. The final two numbers are the min and the max. That, of course, will give us the overall range for all of our data. So, boxplot is basically going to take this five-number summary and give us a visual picture of it. A boxplot would be started by taking our quartiles Q1 and Q3 to form the edge of the boxes. So, visually the length of the box is our interquartile range, a nice measure of spread. Then, the median location would be marked in the middle of the box at the right location, and the min and max would be drawn by taking lines and extending them out from the box. These are sometimes called whiskers, so these are often referred to as box and whisker plots. So, let's take a look at a few examples. In our first example, we're looking at the heights of adult males. So, here's our histogram. We might summarize this as being roughly symmetric, bell-shaped, maybe centered around 68 inches. Ranging from just shy of 62 inches going past 74 inches. Now, what the histogram only, we aren't able to tell the exact max and min, so here's our five-number summary where we can see indeed the mean and the median are the same at 68.3 inches, and that our tallest adult male was actually 75.1 inches. We'll take our five-number summary and create the boxplot. So, first the quartiles are again used to form the box, and the interquartile range then is the length of that box. The median is shown at the 68.3 inches, and then our shortest and tallest adult male min and max are also shown on our boxplot. We mentioned that our histogram looked roughly symmetric and the resulting boxplot also reflects that symmetry. We are able though, however, to pull out the min and max and some exact values from our boxplot as we were not able to from the histogram. Example number two. We're looking at systolic blood pressures. Now, this distribution is certainly not symmetric, in fact, it is said to be skewed. Do you recall what direction of skewness this would be? If we take a look at the distribution, it is the tail that's being pulled out to the right. We have a few unusually large values of blood pressure, and so this would be called skewed to the right. Our boxplot of the same data is showing us those few unusually large values too. It's showing us the outliers. Boxplots have a technique for identifying outliers, and those points are plotted separately for us, so that we can actually see them, draws attention to them, they might be the most interesting part of our dataset. But, certainly look, we want to take a look at them. Our third examples with a small set of data, just a few quiz scores. Now, it looks like we have some students who did pretty well on the quiz, and some students who did not do so well, and no one in the middle. Now, our boxplot is made of the small dataset and it still is a box. So, we aren't able to see from our boxplot that there was actually a gap in the middle, and that there are a couple of clusters, or groups of observations. So, that's one drawback or caution about boxplots. It's that we don't have the ability to see some of those features of the shape of our distribution. Our final example, we have a lot of boxplots. Here we have boxplots that are showing systolic blood pressure, but we're bringing in now some additional information on both the age and the gender. So, I want you to take a moment and come up with a couple of statements that you can regarding what these boxplots are showing you. So, what do we see? Let's take a look at age first. As we go from the young to the old, we're generally seeing that blood pressures are rising. If we also take a look at the length of the boxes from young to old, we're also seeing that older participants had generally more disperse or more spread out blood pressures as compared to the young, if we turn to looking at males versus females. In the younger age groups, we're seeing that males generally had higher blood pressures as compared to the females. But, that difference is getting smaller and smaller as we go across the age groups, and in fact, it reverses in that final age group where the males have lower blood pressures than the females. This might be attributed to the fact that males that have hypertension tend to pass earlier on, so we have a more healthy population of males that are being used or compared in this final age group. So, here we were able to get a pretty comprehensive look at this data because boxplots are going to be very useful for making comparisons of sets of observations. To summarize, boxplots provide us a very nice graphical picture of our five-number summary. It shows us the center through the median. A couple of ways of measuring spread through the interquartile range, which is the length of the box, and the overall range. Then, it even has an algorithm or rule for identifying potential outliers for some plots them separately. Boxplots can hide a few aspects of some shape. Histograms do a much better job at showing us the shape. But, boxplots displayed side by side are really useful for making comparisons when we have two or more sets of observations.