Importance of Linear Regression:

Regression analysis is used to predict the value of one variable (the ** dependent variable**) on the basis of other variables (the

**).**

*independent variables*The two variables are treated as equals in correlation. In regression, one variable is considered as independent (predictor) variable (X) and the other the dependent (outcome) variable Y.

Dependent variable is denoted **Y**

Independent variables are denoted **X _{1}, X_{2}, ..., X_{k}**

_{ }_{y=}_{ }_{ }**β _{0} + β_{1}x+c**

Above model is referred to as simple linear regression. We would be interested in estimating ** ****β _{0} **and

**β**from the data we collect.

_{1}**Simple Linear regression Analysis:-**

If you know something about X, this knowledge helps you predict something about Y.

_{ }_{y=}_{ }_{ }**β _{0} + β_{1}x+c**

- Variables:
- X = Independent Variable
- Y = Dependent Variable
- Parameters:
- β
_{0}= Y-Intercept - β
_{1}= Slope - ε ~ Normal Random Variable (μ
_{ε}_{ }= 0, σ_{ε}= ???) [Noise]

**Correlation:**

- It is a measure of association(linear association only)
- Formula for correlation coefficient is considered as

*r is the ratio of variance together and product of separate variances*

*r= cov(XY)/sd(x)*sd(y)*

*r* = [n(åxy) - (åx)(åy)] / {[n(åx^{2}) - (åx)^{2}][n(åy^{2}) - (åy)^{2}]}^{0.5}

Where *n* considered as the number of data pairs, x is independent variable and y is so called as dependent variable.

- There are different kinds of correlation methods which are associated with which were Pearson product moment correlation, Spearman rank order correlation, Phi correlation.
- Correlation act as a good indicator where the two variables are measured and treated. The measured value can be anything between -1 to 1.If a correlation is 1 then the two values are completely positively correlated, if it is -1 means then they are perfectly negatively correlated.

Difficulties faced by a student while solving Linear regression and correlation problems

- When a variable is omitted from a regression equation , the regression coefficients on the included variables, will, be in general, be unreliable or invalid, since they are biased estimated of the true population regression coefficients. While the conclusion from statistical theory and is not proven.
- The most common problems while solving linear regression will be the multicolinarity between the variables.
- The relationship between dependent and independent variables should be linear and if it is non linear then R2 decreases and if R2 decreases then accuracy of the model also decreases simultaneously.
- All the variables should be properly imputed and out layers should be identified and removed.
- Missing values should also be removed and variables with repeated observations should also be removed.

