[MUSIC] We're going to begin to visualize our variables with graphs. While we start with graphing one variable at a time, we'll use this as a springboard for ultimately visualizing multiple variables simultaneously within our graphs. Bar charts are most commonly used to examine the distribution of individual variables. Here we show the distribution for the random sample of 1,200 US college students who are asked, what is your perception of your own body? In this bar chart, the X or the horizontal axis includes the three response categories: underweight, overweight, and about right. In the first bar chart, the height of the bars is measured on the Y or vertical axis, as the number or count of college students giving each response. The second bar chart shows the same data, but as a percentage of the total sample. A bar chart helps us display the distribution of a categorical variable. For example, percentage of observations in each category. >> As you recall, the data managed variables of interest in our example project are TAB12MDX, representing a diagnosis of nicotine dependence in the past 12 months, and NUMCIGMO_EST, representing an estimate of the average number of cigarettes smoked per month. We're going to run frequency distributions for each of these variables, including both counts and percentages. I'm going to use the group buy function for this that we also presented when introducing frequency distributions. In addition to frequency distributions we also want to examine corresponding bar charts for these two variables as well. The bar chart is one of the most frequently used graphic visualizations. When visualizing data in Python we'll need to import additional libraries into our program. First we will import the seaborn package with the syntax import seaborn. We also need to import the matplotlib.pyplot library. Because the seaborn package is dependent on this package to create graphs. Because the name of this package is so long, we'll give it the nickname plt, which can be used in place of the full package name when we write code calling this package into our program. We're going to keep it simple. We will use python code to generate graphs that help us learn more about our data and to make decisions about next steps in our research. We're focusing on the function of graphic visualizations, rather than producing polished, presentation ready graphs at this point. Categorical variables can be visualized one at a time with the univariate graphs, that is with single variable bar charts. First off, in order for categorical variables to be ordered properly on the horizontal, or X axis, of the univariate graph, you should convert your categorical variables, which are often formatted as numeric variables, into a format that Python recognizes as categorical. Here's the code. Here I am the astype function to convert TAB12MDX to a categorical variable, keeping the original variable name as-is. The basic code for a univariant graph of a categorical variable is the following. With the countplot function we name the categorical variable for the X axis and define the data frame, here sub2. With the xlabel function, we are able to label the X axis, and with the title function, provide the bar chart with a title. Here is the univariate bar chart code inserted in our sample program, and we save and run the program to generate the requested bar chart. Here is the bar chart we told Python to generate. It shows the number of young adult smokers with nicotine dependence, 896, as indicated by a response code of 1, and those without nicotine dependence, 810, indicated by a 0. Now let's graphically display the frequency distribution for one of our data managed smoking variables. That is, the estimated number of cigarettes smoked per month, NUMCIGMO_EST. Because NUMCIGMO_EST is actually a quantitative variable, the syntax we use in the Python program is slightly different. To visualize a quantitative variable, you'd use the following syntax. With the distribution plot function or distplot we name the quantitative variable for the X axis and ask Python to drop the missing data that is the NANs. We also include the option kde=false. Again, from the map plot lib.piplot library, which we are calling plt, we use x label to label the X axis and title to provide the graph with a title. When you run this, you'll see that the program generates a graphical distribution of a quantitative variable. It generates a histogram. In a histogram, intervals of values are plotted on the X axis, rather than discrete or separate values. From the bars here, you can see that what is displayed is the midpoint of the intervals.