Descriptive statistics that we discussed before like average or median and the corresponding deviance measures like standard deviation, give us some information about our data. But if we want to analyze the data even the one variable in details, we have to understand something about its distribution. The distribution here means that we want to understand which values are more frequent and which values are less frequent. These can be done by some visualization techniques like histograms and bar plots. Let us discuss all of them and the difference between them. We begin the wave numeric and more specifically a real variable. It means that in our data, we have just a column of numbers. These numbers can be represented as points on some axis. So this is axis x, and each observation corresponds to some number that is denoted by a point on this line. For example, we have zero somewhere here and one somewhere here, then this point is just one-half. It is possible to plot all the points on this line, to get something like this. If you look at this picture you see that there are a lot of points here and some amount of points here. But this picture is meaningful only if the number of points and number of elements in your dataset is not too large. Because if there are a lot of points here, you cannot distinguish regions well. You have a lot of points and regions while you have small amount of points. So usually people use a different visualization technique which is called histogram. We already met with histograms in probability curves. So let me recall briefly how histogram can be constructed. First of all, we divide all range of our data into segments, something like this. Then we plot rectangles over each segment, in such a way that the height of the corresponding rectangle is proportional to the number of points in this rectangle. We assume that these segments are equal to each other. So we have two points here. Here we have three points, so it is a little bit taller. Here we have one point, so this rectangle is twice as short as this rectangle, and we have four points here. So this rectangle have to be twice as large as this rectangle. So these picture allows us to understand in which regions we have a lot of points, and in which region we have smaller amount of points and where we don't have points at all. This is called histogram. So histograms are used to represent distributions of numerical variables. What about categorical variables? Let us assume that I have a variable like hometown. In this variable, I have the following values. This is Moscow, New York, London then again New York, then again London, and then again New York. Now assume that I'm interested in the frequencies of values in this set. So it means that I want to ask a question, which values are frequent and which values are rare? In the answer of this question I can find frequencies of each values. So I just have to count how many times each value appear in this series. I can draw a table like this. Moscow one time, New York three times, London two times. So this is value and this is frequency. This table effectively summarizes this series of values. For example, if we increase the size of this series, if we add a lot of items here, if the overall number of all possible values are only limited to these three options; Moscow, New York, and London, the size of this table will be the same. We will just increase numbers here. So it is sometimes a very good idea to look at this kind of table and understand which values are frequent and which are rear. But sometimes we would want to visualize this kind of table. This can be done using a picture which is called bar plot, and it resembles this histogram but it is quite different from it. So we have to draw some segments that corresponds to each of values here. Like here Moscow, here New York, and here London. Then over each of the segments I draw a rectangle which height is proportional to the corresponding frequency here. So for Moscow, I draw this rectangle, the height is one. Here for New York, I draw much taller rectangle. This height is three, and for London I draw this rectangle of height two. This picture consist of these rectangles is called bar plot. Know it is important to distinguish between histograms and bar plots. Because if you want to plot a histogram for these categorical data, then you probably want to do something strange. You need bar plot here. Also bar plots can be used to draw not categorical variables but count variables. In this case, they are very similar to histograms, but again, the meaning is it will be different because for histograms you can select the length of this segment as you find it appropriate but for bar plot, every rectangle corresponds to exactly one possible value of your variable. So you have to distinguish between these kinds of plots. Now let us discuss how random variables and their properties are connected to our data and the corresponding descriptive statistics.