Linear Regression is one of the easiest, and most efficient, moreover, a popular machine learning algorithm that comes under supervised learning performs regression tasks to predict results. If you go through some statistics or linear algebra references, you may find something related to

*or simply regression. Regression analysis is a statistical approach to understanding the relationship between two or more variables like how one changes according to the other. For instance, how our weight changes with height. Probably we can say if height increases weight also increases, so there is a certain relation between height and weight, and can say they are somehow related. The basic idea is to find the regression line that fits best for the data given. Now you have an intuition on regression let's find out how a linear model makes its prediction:***regression analysis**## Let's do some math

A linear model makes predictions based on the linear relationship of a variable

**Y**for a given variable**X**by a straight line**(**Can you recall the slope-intercept form you have learned in your high schools? ).**Here if you have****m**and**b**values you can simply substitute them in the slope-intercept form to predict**y.****y(pred) = θ1 + θ2.x**(Hypothesis function)

θ1 = Y-intercept

θ2 = Slope of the line

x = Given X values

The equation for the Y

**-**predict given above is the same as the slope-intercept form**y = mx + b.**By substituting the values of**θ1****and****θ2**we'll obtain the Y**-**predictions.Once we obtained the Y

**-**predicted values, we need to compare those with the actual**Y**values to check the error between the points from the straight line formed. The error rate can be identified by the**MSE(Mean Squared Error)**method. The lesser MSE the greater prediction accuracy.Error calculation

MSE is the square of the distance between the actual points and the predicted values. If we don't square the error, some values will cancel each other.

**Least Squares Method**which help's to find the best-fit equation.

## Least Squares Method

Here are some X and Y values:

X | Y | (x - x̄)**2 | (x - x̄)(y - ȳ) |
---|---|---|---|

1 | 2 | 7.84 | 10.08 |

2 | 3 | 3.24 | 4.68 |

4 | 7 | 0.04 | 0.28 |

5 | 5 | 1.44 | -0.72 |

7 | 11 | 10.24 | 17.25 |

Total 19mean 3.8 |
28mean 5.6 |
22.8 |
31.57 |

The data given in the table can be applied to Least Squares Method to find the Y-intercept

**θ1 (b)**and the slope**θ2 (m).**Here

**'**x̄ ' is the mean of all**X**values and**'**ȳ ' is the mean of all**Y**values. Now if we substitute the values given in the table to the formula of**m**we'll get:**θ2(m)**

**= 31.57 / 22.8 = 1.38464...**

Now let's find

**b-**

**b = ȳ - mx̄**

**θ1(b) = 5.6 - 1.38464 * 3.8 = 0.33836**

Now we know the slope of the line as well as the Y-intercept. Let's look at the Y-predictions.

Y predictions:

X = 1, Y = 0.33836 + 1.38464 * 1 = 1.72

X = 2, Y = 0.33836 + 1.38464 * 2 = 3.10

X = 4, Y = 0.33836 + 1.38464 * 4 = 5.87

X = 5, Y = 0.33836 + 1.38464 * 5 = 7.26

X= 7, Y = 0.33836 + 1.38464 * 7 = 10.030

Anyway, some Y-predicted values are close enough to the actual

**Y**values. So this is how Least Square Method is used to predict values in Linear Regression mathematically.Another way to find the best-fit equation that directly gives the results is the

**Normal Equation**.## Normal Equation

Normal Equation for coefficients |

If we calculate the given values of the table above in the Normal Equation we'll get a resulting matrix with the values same as we got before:

[ 0.3333

1.3859 ]

Normal Equation performs computations by finding the inverse of the given matrix.

Implementing in python:

import numpy as npX = np.array([[1],[2],[4],[5],[7]])y = np.array([[2],[3],[7],[5],[11]])X_b = np.c_[np.ones((5,1)), X]# m, btheta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)# Normal Equationprint("Theta best:",theta_best) # slope and y-intercepty_predict = X_b.dot(theta_best)print("\nY predictions:", y_predict)Theta best: [[0.33333333][1.38596491]]Y predictions: [[ 1.71929825][ 3.10526316][ 5.87719298][ 7.26315789][10.03508772]]

# calculating the mean squared errorfrom sklearn.metrics import mean_squared_errormse = mean_squared_error(y,y_predict)print("MSE",mse)MSE 1.4807017543859655

Regression line formed:

You may also like: Digit recognition using python