Hey guys, in the last video we went over how to make line graphs using dates as continuous measures. Treating dates as continuous variables or measures allowed us to run regression analyses and get best fit lines that had p values associated with them. These p values gave us an idea statistically of how reliable those best fit lines were and therefore how much we should trust the trends we saw. In this video, I'd like to show you an advantage of treating dates as discreet variables or dimensions. When you treat dates as discrete variables or dimensions, you can now make a statistical type of visualization called a box plot, or a box and whisker plot. These box and whisker plots allow you to see many more details about your data than best fit lines [INAUDIBLE] can show you. Even best fit lines that have confidence intervals. So, what are these box and whisker plots? In a box plot, you will see a box, a line inside the box, some whiskers, and sometimes some dots outside the box for the whiskers. The line in the box represents the median value of the group. The box shows the values half-way between the median and either the minimum and the maximum value in the data set. If you were to take all the values in your data set and rank them so that the minimum value goes first and the maximum value goes last, you could break the ranked data into four equal groups. The three points that divide the data set into four equal groups are called quartiles. The smallest value of those three is called the first quartile. And the largest value of those three is called the third quartile. The interquartile range, then, is the distance between the first and the third quartiles. The whiskers on a box plot usually extend to 1.5 times the interquartile range. Any data points outside of these whiskers are generally considered outliers. Some people change the whiskers so that they represent the maximum and minimum values of the entire data set. But that tend to make plots more dense and crowded then they already are. So any box plots we use in this course will have whiskers that represent 1.5 times the intercortial range. As you can see, box plots do not make prettier, simple pictures to show other people, but they are very useful in summarizing the raw data values within a group for yourself as an analyst so you, or your group, can have a feeling for the raw data. In the last video, we made one line graph at the top for median paid wage for a given time point. Underneath that, we made a graph that showed us the maximum salary offered for that same time point. And then we made a third graph underneath that that showed us the minimum salary offered for that time point. An alternative way to look at that data and get similar types of information, would have been to treat date as a dimension, rather than a measure. And to make a box plot for every single one of those time points, and then look at different box plots for each subgroup category. Let's try it. Okay go to your Tableau workbook and make a new worksheet. Now this time I want you to put Paid Wage Per Year in Rows again, and we're gonna put Case Receive Date in Columns. Now go to the Show Me card and click on this icon here that's for box an whisker plot. Now, when you click it, you'll see that the year went down to the Marks card. Now, the reason why, is because, right now, the only variables that Tableau has in its workspace is Median Paid Wage Per Year, broken up by year. So Tableau is taking its best guess about what you're trying to do with those data points. In order to make a box plot, you have to have a set of data, you can't just have one data point. So it assumed that you just wanted to put all the years together. And that was actually incorrect in our case. So we're gonna move Year back up to Columns. I want you to make sure that the pill is blue and that's a dimension. Now, this doesn't look like a box plot yet. It just looks like a single data point with a line. Any idea why that would be? The reason why, is because as I alluded to before, Tableau sees this as just one data point now. In order to make it a box plot, it needs lots of data points. It needs to not be aggregating over each year. So we're gonna go up to Analysis, and we're going to disaggregate our measures. So this is what the box plot will look like. Now, we want to look at these box plots broken up according to the different job title sub groups. To do this, I want you to put it, the Job Title Subgroup on Filter. And for right now let's go ahead and include all of them. And let's make a quick filter that we can look at. Let's get rid of the Show Me card, and here is our filter. So you can see that if you click on one of our Job Title Subgroups, you look at the actual values that are here. If you look at the minimum value here, you see that the lower whiskers says 45,150. So if I change this and I go to data analyst, we'll see the whiskers are different. So this is showing you that it is indeed calculating the box chart differently each time for the different Job Title Subgroups you have on the Filter. Now I just want to point out to you, you would not get this result if we did the highlighting trick we did before. So if we put Job Title Subgroup on a color. Have our entire filter If you click on the different Job Title Subgroups over here in the highlighting bar, you'll see that although it's showing you different points that are going in to the calculations, the actual calculations in these boxes aren't changing. So this is different than the trend lines that we were showing before. The trend lines can actually use the color of property in the Marks card to make different trend lines. That's not the case when you're looking at box plots. Box plots actually need the raw data and have to aggregate the raw data. So let's actually get rid of this on color, cuz it's not that helpful. And let's go back to actually looking at our data, in different Job Title Subgroups. So first, let's look at data scientist. Now when you look at the data this way, you can see very clearly that there are quite a lot of jobs salaries that are below the first quartile. Remember, this box shows you where 50% of your data is. 25% of your data is above this line, and 25% of your data is below this line. And so it's clear here, that over time, there are more and more salaries that have lower and lower values. There are also more and more salaries that have higher values. So this is showing us in an even stronger way than before, that although the median doesn't seem to be changing all that much, again remember, the median is this line here. It seems to be increasingly possible to get both a higher wage, and the lower wage than the median. Lets look at another subcategory, like software engineer. Now, at first glance you would think, this is very different than data scientist. But remember that trick I told you before about looking at the axis? Before, the top salary that we had for data scientist only went up to something like 250k. Here, the top salary is over a million, so we actually want to compare these by eye. We want to make sure that the axis are the same. So lets change this axis. Click on the axis and click on Edit Axis, and lets make it a fixed value from 1 to 300,000. And we know right away when we do this we're gonna be cutting out some data, or be cutting out some of the outliers. But it will let us compare most of the data more carefully. So now, when you look at this, you see that there are indeed more and more salaries that are below the median or below that first quartile. But doesn't seem quite as bad as for the data scientist. On the other hand, there seem to be more and more very high paid software engineers. Let's go back and look at data scientists using the same axis. And you'll see that indeed, there seems to be this trend of lower and lower, minimum wages seems to be stronger and you definitely don't see as many outliers for the data scientists. What about for our business analyst? This doesn't seem to be going down quite as much, but you see more and more going up, more and more maximum values. And now let's look at our data analyst. Data analysts looks pretty similar to business analysts. Now overall, because these plots show more data, even though they don't have the statistical model associated with them like our nest fit line did. They show even more convincingly than the line graphs that we saw in the last video, that there might be an increasing amount of opportunity to be awarded with very high salaries and data related jobs as time goes on. But there also might be increased risk that you'll make less than you expected, especially if the data analyst and business analyst jobs and the new trend from 2015 continues. Overall in this lesson, you've now gained experience with the effects of treating a variable as continuous or discrete. Treating a variable as continuous or discrete will affect the types of statistics you have access to and will also affect the type of information you can get. The results of these statistics and the information you get can actually impact your analysis and its interpretation. At the very least, it will impact your confidence in the interpretation. Now, if you try to convert a variable from discreet to continuous in Excel, it would take a long time. But one of the beautiful things about a program like Tableau is that it let's you do that almost instantaneously. It is so quick and so easy. So, this is a good example of why visualization and visualization programs can really expedite the speed of which you can make it through your data analysis.