Welcome to Tools for Machine Learning. After watching this video, you will be able to: Describe why data is important for machine learning models. List common languages for machine learning. Describe tools for data processing and analytics, data visualizations, machine learning, and deep learning. And finally, define CPU and GPU. Data is a collection of information. Data is central to every machine learning algorithm. It is the input of every machine learning algorithm and is what machine learning algorithms use to train the parameters of the model. Even the best machine learning algorithms cannot generalize well without good data. Without data we cannot train our model and without good quality and a large enough amount of data, it will be difficult to develop a well-performing machine learning model. Python is a popular language library for machine learning because it contains a large number of machine learning libraries. Python's syntax is easy to read, making it easier to build machine learning models. R is a popular language for statistical learning and contains a large number of libraries for data exploration and machine learning. Some other languages for machine learning include: Julia: A language designed for numerical and scientific computations and supports high-speed mathematical computations. Scala: A language often used for processing big data. Java: A language that makes scaling machine learning applications easier for machine learning engineers. And JavaScript that is typically used for running machine learning models in web browsers. In machine learning, we store and retrieve our data; and work with plots, graphs, and dashboards to explore, visually inspect the data, and develop our machine learning model. We will be breaking up the tools we use for machine learning into: Analytical tools. These are tools for processing, storing, and interacting with the data for your machine learning models. Data visualization tools. You can use these for understanding and visualizing the structure of your data. Machine learning tools. These are for creating your machine learning model. Deep Learning tools. These are frameworks for designing, training, and testing neural networks more simply and efficiently. Let’s go over some data processing and analytics tools: Spark is a data processing framework used for quickly processing big data. Hadoop is an open-source software framework used for efficiently processing and storing very large data sets. MySQL is an open-source relational database management system based on SQL; a language designed for managing data. Some commonly used data visualization tools are: Matplotlib, which is a library for static plots and interactive visualizations. Seaborn, a data visualization library based on matplotlib. It provides a high-level interface for drawing more attractive and informative statistical graphics. ggplot2, an open- source data visualization package in R. It allows you to build and add elements to your graphics in layers. Tableau, a data visualization and business intelligence tool that makes it easier to make interactive data visuals in the form of dashboards. The following libraries are popular in machine learning: NumPy provides support to easily manipulate large and multi-dimensional arrays. Pandas is a powerful tool used for data manipulation and analysis. SciPy is used for scientific computing and has models for optimization, integration, linear regression, and more. Scikit-learn library contains tools for statistical modeling, including regression, classification, clustering, and so on. It is built on NumPy, SciPy, and matplotlib. Generally, it’s relatively simple to get started with it. In this high-level approach, you define the model and specify the parameter types you would like to use. Popular libraries for deep learning are: TensorFlow, an open- source library for numerical computing and large-scale machine learning that is used for deep learning. Keras, an easy-to-use deep learning library for implementing neural networks. It allows you to build the standard deep learning model. Like Scikit-learn, the high-level interface allows you to build models in a quick and simple manner. PyTorch, an open- source library that is mainly used for deep learning and used for applications in computer vision and natural language processing. It is used for experimentation, making it simple for researchers to test out ideas. Theano, a library used for efficiently defining, optimizing, and evaluating mathematical expressions involving arrays. OpenCV, which stands for Open-Source Computer Vision Library, a library or tool of programming functions mainly aimed at real-time computer vision. Its application includes object detection, image classification, augmented reality and more. OpenCV contains machine learning libraries like tree model libraries, K nearest neighbors, support vector machines, and deep neural networks. In machine learning, especially when dealing with large data sets and training deep learning models, you might need a processor that can process the data significantly faster. Now, let’s compare CPU and GPU. CPU stands for central processing unit and is responsible for processing and executing instructions on your computer, it is designed to run almost any type of calculation. GPU stands for graphics processing unit. GPUs were originally developed to improve a computer's ability to process 3D computer graphics. GPUs are great at computing complex mathematical data required in many machine learning methods such as neural networks, making them perfect tools for machine learning. GPUs are used in machine learning to help accelerate the training of a model since a model can be trained quite slowly using just CPU. In general, any machine learning enthusiast will experience a lot of lag when training machine learning models on larger data sets, and even simple deep learning models can sometimes take days to run using only CPU. In this video, you learned that: Data is the heart of every machine learning algorithm. Some common languages for machine learning are Python, R, and Julia. Popular libraries for deep learning are TensorFlow, Keras, PyTorch, Theano, and OpenCV And GPUs are used in machine learning to help accelerate the training of a model since a model can be trained quite slowly using just CPU.