During this lecture, we will cover three topics. First, what is Tidyverse? Second, how to install and load packages in R. At last, what is percent greater than percent pipe in R and how to use it? What is Tidyverse? Tidyverse is a collection of R packages that are used for data science. What is a package? A package is a collection of functions and datasets created by R community. Currently, there are tens of thousands of packages on cron. Cron is a network of PTP and web servers around the world that store identical up-to-date versions of code and documentations for R. Now, Tidyverse packages are mostly maintained by RStudio team. This is the team that provides us with this amazing studio-integrated development environment. Let's cover some of the most packages in Tidyverse. Dplyr is an extremely popular package for data transformation. We will cover this package in our course to an extent. Second famous package is tibble. Tibble provides us with a modern take on dataframes. Third, stringr. Stringr is a package for working with strings. Tidytext is one of the most important packages for this course. It has menu functions that help you to solve several texts mining tasks easily. Readr is a package for a generic angular data in csv, tsv, and other formats into R. Tidyr is a package that helps you to create a tidy data. When your data is tidyr, when every column is a variable, every row is an observation, and every cell is a single value. At last, ggplot2 is truly an extraordinary package for creating graphs in R. Now, let's discuss how to install packages. As you can see, packages can vastly expand the capabilities of R, so it is important to understand how exactly to install them. To install a package, you need to use install.packages function. Type in the following code in the script, highlight it, and click on the "Run" button. What does it mean that the package has been installed? It means that the functions and datasets from the package have been downloaded and then packed successfully on your computer. You might see this message, "Package 'tidyverse' has successfully unpacked and MD5 sums checked." It means that the package has been downloaded and installed successfully. The easiest way to use functions from the package is to load the whole package into a current R session. To do so, you need to use the library function. Type in the following code in the script, highlight it, and click on the "Run" button. The package has been successfully loaded in the current R session. Now, let's try to examine what this message in the console pane means. First, Tidyverse is not just one package. It is a collection of packages. That is why library parenthesis tidyverse attaches several packages like ggplot2, tibble, et cetera. Second, the attachment of dplyr package creates a conflict with the basics R stats package, because both of these packages contain filter and log functions. This conflict is resolved by mask and filter and lag functions from stats package. So if you write "Filter" as a function, you will use dplyr's filter, and diplyr's lag by default. To access these two functions from stats package, you can write "stats::filter", or "stats::lag" function. If you are not comfortable with installing packages through install.packages function, you can use an alternative option in the packages tab at the bottom right corner of the RStudio. This tab has two buttons that could help you to install packages or to update them. Click on "Install" button. This button will open a pop-up window where you can write names of the packages you want to install. After you have written tidyverse in the packages field click on "Install" button. These manipulations will reinstall tidyverse package. Generally, it might be a good idea and a good practice to reinstall packages from time to time to update them because R packages often actively evolve. You can also load packages through the packages pane. All you need to do is to just click on an empty box near the package's name. For instance, let's load the dplyr package. Just click on an empty box near dplyr board in the package's pane. This package is loaded again. You can also detach a package by unticking a box with a tick mark near the name of the package. We have pretty much covered basics of installing and loading packages. Now, let's turn to pipes. Greater than percent pipe is one of the most powerful feature in tidyverse. Pipes allow you to change several operations in a readable way. Imagine you want to create ten random double-precision digits, round them to two, and then sum the results. You will need three functions for this. The rnorm, round, and sum. You can read about these functions by using the help pane. Type in the following code in the script, highlight it and click on the "Run" button. This is what is known as a nested function. You need to read it inside and out. First, rnorm with N argument equals to 10 is evaluated. This command will return 10 random values from a normal distribution with the following parameters; mean equals to zero and standard deviation equals to one. Then the standard random values will be rounded to two digits. After that, we will receive a sum of these 10 rounded random values. Pipes allow you to avoid this form of writing code by structure and sequence of data operations left-to-right as opposed from the inside and out. Avoid nested function calls, and minimize the need for local variables and function definition. You may read more about it on magrittr web page. The pipe automatically passes the output from the first line into the first argument on the next line. You can think about percent greater than percent pipe as the word then. You may read this code in the following fashion, generate 10 random values, then round them to two digits, then sum the results, and now we get our results. You might ask why do we get different results by using these two expressions? It happens because we use random values during the first step of our nested function. To solve this problem, you can use set.seed function. This function helps you to fix the random number generator state for random number generation in R. Then bring the following command, highlight them, and press "Run." We need to use set.seed before each call where you want to fix the random number generator state. Now, we have identical results. Try to use the different seeds for these functions to see how these numbers will change the behavior of the functions.