Linear Regression, Bias Variance Trade-off, Regularized Linear Regression

Priyanshi Singh
4 min readApr 7, 2021

--

Linear Regression

Linear Regression is a machine learning algorithm based on supervised learning (i.e., labelled dataset). It performs a regression task. It is a linear model, e.g. a model that assumes a linear relationship between the input variables (x) and the single output variable (y). It is also called Ordinary Least Squares (OLS). It can be represented in the form of a straight line:

simple linear regression equation

Linear regression consists of finding the best-fitting straight line through the points. The best-fitting line is called a regression line.

example of best fit line on data points

Assumptions of Linear Regression

There are four assumptions associated with a linear regression model:

Linearity: The relationship between X and the mean of Y is linear.

Homoscedasticity: The variance of residual is the same for any value of X.

Independence: Observations are independent of each other.

Normality: For any fixed value of X, Y is normally distributed.

Bias Variance Trade-off

What is bias?

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.

What is variance?

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

Bias variance trade off using bulls eye diagram

Underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance.

Overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high variance.

If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data. This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.

To build a good model, we need to find a good balance between bias and variance such that it minimizes the total error.

TotalError = Bias^2 + Variance

error vs model complexity

Regularized Linear Regression

In bias variance tradeoff we learnt that there is a sweet spot where ur total error is minimum for getting the optimum model complexity. If our model complexity exceeds this sweet spot, we are in effect overfitting our model; while if our complexity falls short of the sweet spot, we are underfitting the model. With all of that in mind, the notion of regularization is simply a useful technique to use when we think our model is too complex (models that have low bias, but high variance). It is a method for “constraining” or “regularizing” the size of the coefficients (“shrinking” them towards zero).

There are two types of regularization techniques: Ridge Regression and Lasso Regression.

Ridge Regression

Ridge regression is also called L2 regularization. It adds a constraint that is a linear function of the squared coefficients.

Lasso Regression

Lasso regression performs L1 regularization, which adds a penalty equal to the absolute value of the magnitude of coefficients.

Lasso regression shrinks coefficients all the way to zero, thus removing them from the model.

Ridge regression shrinks coefficients toward zero, but they rarely reach zero.

A tuning parameter, λ controls the strength of the L1 and L2 penalty. λ is basically the amount of shrinkage:

  • When λ = 0, no parameters are eliminated. The estimate is equal to the one found with linear regression.
  • As λ increases, more and more coefficients are set to zero and eliminated (theoretically, when λ = ∞, all coefficients are eliminated).
  • As λ increases, bias increases.
  • As λ decreases, variance increases.

If an intercept is included in the model, it is usually left unchanged.

--

--