Now, let us discuss how to use Python to analyze data. To do so, we will use a well-known library which is called pandas. Pandas allows us to read table from some file into a structure which is called DataFrame. DataFrame is just an internal name of by the object that contains table inside. First of all, I have to import pandas and this is usually done by import pandas as pd. This allows me to get access to different functions inside pandas model using these short prefix pd. Then, let us load some dataset and play with it a little bit. To do so, I will use dataset which is called adult and it is based on the results of census in the United States and each contains information about some people and allows us to find some useful insights in these data. I download that file adult csv, and place it to the folder where Jupyter was started. So now, I can just load it into pandas using the following command, pd.read_csv and here is the name of the file, let us run it. Here df is just the name of my variable that will contain this DataFrame, df stands for DataFrame, but it's just my choice. Here I say that I will use function from pandas model and read_csv is the name of this function. As the name suggests, it just reads some csv comma separated this file and returns my DataFrame and now we just look inside this DataFrame and here you see the table that contains some information about different persons, their age, where they work, their education, and so on. Here, all columns are shown in this Window, but it is possible that our DataFrame is so large, it has so many columns that this command will not show us all the columns and in this case, we can use the special syntax that allows us to find all counts in dataset. We just put a dot here and so use columns property of this object. This is a kind of list which contains all columns that we have in this DataFrame. As we discussed, rows in the table that we have are usually called observations and columns are usually called variables, and different variables can have different types. For example, age is numeric variable, it contains numbers and workclass is categorical variables because it contains some categories of job for a particular person, for example, state two private and so on. So we may be interested in getting information about types of all variables in dataset. To some extent, it can be done using the following property dtypes. Dtypes report information about the internal python representation of each variable. For example, age has type int64, which means that it is integer number that takes 64 bits in the memory. So basically, it is just internal numbers that can be rather large, and for example workclass is an object. Object is a kind of guitar [inaudible] type of data in pandas. It means that in this column, there can be non-numeric elements. Usually, these elements are strings, but it is possible that we have all in the same column, strings and numbers and all of them are called just object, but usually it is not very good idea to store strings and numbers in the same column. Usually, it is better when each column has only one type. Now, we may be interested in some descriptive statistics over these dataset. We want to look at the data and give some general impression about what our variables. This can be done using method describe. As you see, this method gives us a new table and this is actually again a DataFrame, but in this DataFrame rows are not observations as previously, but each row contains information about some descriptive statistics of all numeric variables in our dataset. For example, here for age we see that the mean value of age is 38 and standard deviation of age is 13 and minimum value is 17 and so on. Here are also quartiles, so this is median, this is first quarter and this is a third quarter and finally this is the maximum value. So we have just usual descriptive statistics for numeric data. If we're interested in non-numeric data, we have to pass a special option include equals to "O". O here stands for object and in this case instead of reporting descriptive statistics for numeric variables, we have descriptive statistics for non-numeric variables which are object variables. Here you see for example that for workclass, you have eight unique values. So eight categories for this variable and the most common category is private and this number is frequency of this most common category. In the same way, we can find that the most popular education is HS-grad and we have 16 different levels of education in these dataset. So this allows us to get some general impression about their dataset. Now, if we are interested for example in particular levels of for example this workclass, we can use method unique for the corresponding column. First of all, we have to select this column and this can be done using these score brackets and when I put the name of the column, I get the column itself, and then I can apply different alterations to this column. So if we wanted to get all unique elements in this column, we can use this unique method and this is all levels of all categories that are written into this variable. You see here that most of these levels are just strings, but there is a special item which is none, which stands for another number. This is a special value in pandas and actually not only in pandas, but in python itself and in pandas, this value denotes the so-called missing or NH, so it means that the corresponding ReLU is not known, and this value needs special treatment and we will discuss it later in details. It also can be interesting for us to count all rows depending on the value of this variable. So we probably want to know how many persons in our dataset who work in home, private workclass, and how many work in federal and so on. So this can be done with method value _counts. This gives us information of how many records we have for each value of this variable. So as you see, in this way we can get just basic descriptive statistics of our dataset. Now, let us discuss how to visualize them.