In chapter 2.2.4 Graphical Analysis we have looked at plotting a Scatter Plot to depict the correlation graphically.
The single linear regression help to look at the linear correlation of two variables (for this reason is single), identifying the linear mathematical equation that explains the data. This equation can help respond to the question “what will be the Y at a certain value of X”?
The step for calculating the linear regression is:
- Look at the scatter plot. If the data don’t graphically follow a linear way, you don’t have a linear correlation;
- Use the Ordinary Least Squares Single Regression, a method that gives you the linear equation that explains your data.
- Calculate the R Squared percentage of the observation variance from your linear equation.
The OLS Single regression give an euqation in the format of Y= a + bX.
- Y is the dependant variable, so the one for which we want to know the value;
- X is the independent variable;
- a is the Y value when the X is 0;
- b is the slope
In the image1, you can look at the formula of the OLS Single Regression that gives you the linear regression equation, even if, in the real world, you calculate it by the same software like excel in the actual work in the real world.
In the image2, you can look at the math formula of R; for R squared, you need to make the squared of this formula. R squared has a value from 0 (low correlation) to 1 (high correlation). Even R squared is calculated in excel.
Now we want to make an example starting from the data of X and Y in table1.
With the data of the table1, we can depict the scatter plot with excel, and, in the property of excel, we can even display directly the OLS Single Regression equation and the R Squared like in the image2.
We are looking at the diagram that the data are linearly correlated, and we also have the linear equation.
We also have the R squared who said we have a robust correlation because we have a value near 1.
It’s important to know that:
- The simple linear regression doesn’t work well with an outlier. Outliers are single observations with an abnormal value. With an outlier, you tend to have R squared very low;
- You can remove the outlier observation to calculate the R squared and the equation, but they must be an outlier. In other words, one thing is to remove less than 5% of value; another thing is if you remove 20% of value;
- Y must be a parametric variable. Instead X can be parametric or even nominal/categorical.
For the exam remember that:
- The linear regression helps you to give the equation of the regression;
- With the R square, you can look at how much the observation are distant from the linear regression equation;
- R varies from 0 to 1, where one is a high correlation;
- Linear regression doesn’t work well with outlier;
- Y must be a parametric variable, Instead X can be parametric or even nominal/categorical.