Linear Regression

Sruti Samatkar
Analytics Vidhya
Published in
4 min readDec 9, 2020

--

Regression analysis is one of the most important fields in statistics and

Fig 1 — Regression

machine learning. There are several regression methods available. Linear regression is one of them. Regression searches for relationships among variables. In statistical modeling and in Machine learning that relationship is used to forecast the result of further or future event.

Linear Regression

Linear regression is probably one of the most important and widely used regression techniques. It’s among the simplest regression methods. One of its main advantages is the ease of interpreting results.

Linear regression tries to form the relationship between two variables by making a linear equation to observed data. One variable is considered to be an descriptive variable, and the other is considered to be a dependent variable.

Fig 2 — Linear Regression relation between x and y

Simple Linear Regression: Simple linear regression is the simplest case of linear regression with a single independent variable, 𝐱 = 𝑥.

Multiple Linear Regression: Multiple linear regression is a case of linear regression with more than one independent variables.

Polynomial Regression: Polynomial regression is a generalized case of linear regression. One assumes the polynomial dependence between the output and inputs and, consequently, the polynomial estimated regression function.

Implementing Linear Regression in Python

Fig 3 — Linear Regression in Python

Python Packages for Linear Regression:

The package NumPy is a fundamental Python scientific package that allows many high-performance operations on single- and multi-dimensional arrays. It also offers many mathematical routines. It is open source.

The package scikit-learn is a widely used Python library for machine learning, built on top of NumPy and some other packages. It provides the means for preprocessing data, reducing dimensionality, implementing regression, classification, clustering, and more. Like NumPy, scikit-learn is also open source.

Simple Linear Regression with scikit-learn : Let’s start with the simplest case, which is simple linear regression.

There are five basic steps when you’re implementing linear regression:

  1. Import the packages and classes that are needed.
  2. Provide the data to work with and then do appropriate changes.
  3. Create a regression model and fit it with existing data.
  4. Check the results of model fitting to know whether the model is satisfactory or not.
  5. Apply the model for predictions.

Lets see an example where we predict the speed of a 10 year old car.

  • Import the modules needed.
Fig 4 — Importing the needed modules
  • Create the arrays that represent the values of the x and y axis:
Fig 5 — values of x and y
  • Execute a method that returns some important key values of Linear Regression:
Fig 6 — A method to return key values
  • Create a function that uses the slope and intercept values to return a new value. This new value represents where on the y-axis the corresponding x value will be placed:
Fig 7 — Define a Function
  • Run each value of the x array through the function. This will result in a new array with new values for the y-axis:
Fig 8 — Run x through the function
  • Draw the original scatter plot:
Fig 9 — Scatter Plot
  • Draw the line of linear regression:
Fig 10 — Line of linear regression
  • Display the diagram: plt.show()
Fig 11 — Screenshot of the code with output.

Conclusion:

Linear Regression is easy to implement and easier to interpret the output coefficients.When you are aware that the relationship between the independent and dependent variable have a linear relationship, this algorithm is the best to use due of it’s less complexity compared to other algorithms.Linear Regression is a great tool to analyze the relationships among the variables but it isn’t recommended for most practical applications because it over-simplifies real-world problems by assuming a linear relationship among the variables.

--

--