Welcome to Regression. After watching this video, you will be able to: Define key concepts in regression. List some common regression algorithms. Interpret the results of regression. Differentiate between classification and regression. And, determine whether classification or regression is suitable for your problem type. To begin, consider the following scatter plot. Assume you have midterm grades on the x-axis, final grades on the y-axis, and you have plotted the points associated with each grade. Now, you want to know the final exam scores of students who had a score of 35 on their midterm. You can guess that their final score will be very close to 40. What if you wanted to guess a final score for a student whose midterm grade falls outside the range of the plotted points? Say, a student who got 80? Okay, you guess. I will guess around 85. Let’s look at the data set. For a student who scored 80 on the midterm, the original data set has a final score at 86.5, close to what I guessed. The previous example was a regression analysis. Basically, you can fit a line through the data and make an educated guess for values within or outside the range of the data set. This line is called the line of best fit. Simply put, regression is the relationship between a dependent and independent variable. In regression, the dependent variable is a continuous variable. Examples of regression include predicting the prices of houses in Manitoba, predicting scores on a final exam, predicting the weather, and so on. Looking at our line again, you can use a formula to predict values that are not accounted for. The formula for this line is y equals m x plus b. "m" is the slope of the line calculated as rise over run. Let’s calculate m. You can choose random rise and run points as any two points on this line will give you the same result for m. Rise, in this case, will be 76 minus 55 divided by run, which is 70 minus 50. That will give us 1.05. This means that for every 1 mark increase in a midterm grade, the final exam score will increase by 1.05. B is the y-intercept, that is, where the line meets the y-axis. Mathematically speaking, it is the value of y when x is 0. If you substitute x as zero into the “y equals m x plus b” equation, you will see that the y-intercept is 2.5. This means that even if a student gets a 0 on the midterm, they will still get a 2.5 on the final exam. The examples that you have seen assume a linear relationship between the dependent and independent variable, which is the simplest form of regression. Sometimes you may have multiple independent variables predicting your dependent variable. For example, in the final exam problem, you can have more variables that predict the outcome of the final exam score. You may have two variables instead, such as midterm score and attendance percentage. In this case, you would be dealing with a 3D graph with two slopes: one for the relationship between midterm score and final score, and one for attendance percentage and final score. Now, imagine you have 10 variables. Sometimes, you may have a non-linear relationship and the linear regression model won't do well with a straight line. An example of that is polynomial regression. This model uses a polynomial of degree two to account for the nonlinear relationship. And when you want to avoid too much reliance on the training data, you can use regularized regression, such as Ridge, Lasso, and ElasticNet. The main idea here is to constrain the slope of the independent variables to zero by adding a penalty. Ridge regression shrinks the coefficients by the same factor but doesn’t eliminate any of the coefficients. Lasso regression shrinks the data values toward the mean, which normally leads to a sparser model and fewer coefficients. ElasticNet regression is the optimal combination of Ridge and Lasso that adds a quadratic penalty. The following are popular examples of advanced regression techniques. Random forest. Random Forest is a group of decision trees combined into a single model. Support vector regression, or SVR. SVR creates a line or a hyperplane that separates the data into classes. Gradient boosting. Gradient boosting makes predictions by using a group of weak models like decision trees. And finally, neural networks. The idea behind neural networks is inspired by the neurons in the brain used to make predictions. Here are some differences between classification and regression. Classification works with classes such as ”Will I pass or fail?” or “Is this email spam or not spam?” while regression is mapped to a continuous variable. In classification, values are not ordered, as belonging to one class doesn’t necessarily make a value more or less superior. In regression, values are ordered, and higher numbers have more value than lower numbers. You use accuracy to measure the performance of a classification algorithm, that is, how many you predicted correctly from the total population. In regression, you use the error term, that is, how far away you were from the actual predictions. In this video, you learned that: One key concept of regression is that it is the relationship between a dependent and independent variable. Some common algorithms for regression include random forests, SVR, and neural networks. And, a difference between classification and regression is that classification works with classes while regression is mapped to a continuous variable.