# Regration Analysis Used for

## Regration analysis is statistical tool used for desion making and business forecasting.

According to YA-Lum-Chou ,"Regression Analysis  attempts to establish the nature of relationship between variables that is to study the functional relationship between the variables and  their by provide  a mechanism for prediction and forecasting".
Regressipn analysis  studies the nature of relationship between two variables so that we can estimate one variable against the variable.
Regression Example:
In a chemical process,suppose that the yield of product  is related to the process operating temperature.Regression analysis can be used to build  that expresses yield as a function of temperature.Regression model can be used to predict yield at a given temperature level.It could also be used for process optimization or process control purposes.
There are some basic concept  of regression to understand before apply regression line.
1. Nature of Relationship:
Regression means the act ofreturn.It is mathematical measure which shows the average relationship between two variables.
2., Forecasting:
Regression  analysis enables to make prediction .For example by using regression equation x and y we can estimate the most probable value of x on the basis the given value of  y.
3. Cause and Effect:
Regression analysis clearly express the cause and effect relationship between two variables.It establishes
the functional relationship.Where one variable is treated as dependent variable and other one is independent variables.
4  Absolute and Relative Measure:
The regression coefficient is an absolute measure .If we know the value  of independent variable.we can estimate of value of dependent variable.
5. Effect of change in origin and scale:
Regression coefficient are independent of change in origin but not of change in scale.
6. Symmetry:
In regression analysis it is required to be clearly identified which variable is dependent variable and which one is independent. If b(xy) is not equal b(yx) ,so regression coefficient are not symmetric.

## Regression Line

Regression line shows the average relationship betweentwo variables.There are two regression lines,namely regression lines of x on y and regression line of y on x.On the basis of regression line ,we can predict the value of dependent variable on the basis of given value of the independent variable. It shows the average relationship betweentwo variables is called Line of Best Fit.Here we can estimate the value of x on the basis of given value of y,otherwise  the value of y on the basis of given value of x.

If there is a perfect correlation between two variables x and y so that r = +1, r = -1 then there will be only one regression line.If correlation is positive then the direction of regression line will be upward from left to right.In the case of negative correlation the line will slope down from left to right it is shown in figure

The direction of regression lines express the nature of correlation also. If regression line from left to right upward the correlation is positive. On the other side if these lines move downward then correlation is negetive.
The importnat fact to be noted here is that either both lines will move from left to right or right to left. It is not possible that one will be left to right and other one is right to left.
If two regression lines intersect eachother at 90 degree then it means that there is no correlation  between two variables so r= 0 it is clear from the diagram given below.
Both the regression lines intersect eachother at the point of average of x and y. If we draw a perpendicular from the point of intersection on x axis we will get mean value of x and if we draw perpendicular from the point of intersection from y- axis we will get mean value of y.
The nearer the regression lines are to eachother greater will be the degree of correlation. On the contrary, the greater the distance between two lines, the lesser will be the degree of correlation. It is clear from the following diagram.

## Regression Analysis Used

Regression analysis is used in economic ,business research and all the scientific disciplines.In economics, it is used to estimate the relation between variables like price and demand,income and saving etc.If we know income ,we can estimate the probable saving. In business we know that the quantity of scale is affected by the expenditure on advertisement like social media etc.So a problem is based on cause and effect relationship,the regression analysis is very useful.
Regression analysis is used to find out the standard error of estimate for regressions which is a measure of unexplained variation in dependent variable. It is helpful to determine the reliability of the prediction made by using regression regression.
Regression analysis is used to estimate the value of dependent variable against the given value of independent variable. It describes the average relationship existing between two variables x and y.For example a businessman can estimate the probable falls in demand if he decides to increase the price of commodity.
Regression analysis is used to make prediction in policy making.On the basis of regression analysis we can make prediction which provide sound basis for policy formulation in socio-economic field.We will see some example to use software python and R.
Regression analysis  is used machine learning artificial inteliagence,data science ,cloud computing and security etc.Interconnectivity and data explosition are realities that open a world of new oppertunities for every business that can read and interpret data in real time.Coming from a long and glorious past in the field of statistics ,econometrics,linear regression and its derived methods can provide  you with a simple reliable and effective tools R,python to learn  from data and act on regression analysis.If carefully trained with the right data,linear regression analysis can complete well against the most complex and artificial inteligence technologies,offering you unbetable ease of implemention and scalability for increasingly large problems.
Regression analysis is used in e-commerce, a batch application filtering customers to make more relevant commercial offers an online app recommending products to buy on the basis of emphemeral data such as nevigation records.
Regression analysis is also used in credit and insurance business, an application  selecting whether to proceed with online inquiries from users,basing its judgement on their credit rating and past relationship with the company.
Regression analysis is used in python for data science .It can work better than other platforms with in memory data because of its minimal memory foot print and excellent memory management.The memory garbage collector will often save the day when you load,transform,dice,slice,save or discard data using the various reiterations of data wrangling.

## Regression Line Equation

Regression Line Equation are the algebraic expression of regression line..As there are two regression lines, so there are two regression line equation:
1. Regression line equation of x on y. This equation is used to estimate the most probable value of 'x' against the given value of variable 'y'.Here:
'x' is dependent variable and
'y' is independent variable
2.  Regression line equation of y on x.This regression equation is
used to estimate the most probable value of 'y' against the given
value of variable 'x'.Here:
'y' is dependent variable and
'x' is independent variable.

## Simple linear Regression

The relationship between a single regressor variable x and
a response variable y.The regressor variable x is assumed
to be a continuous mathematical variable,controllable by
the experimenter..Suppose that the true relationship between
y and x is a straight line,and that the observation y at each
level of x is a random variable.Now the expected value
y for each value of x is
E(y|x)=B0+B1x
Where the intercept B0 and the slope B1 are unknown constant.

### Least Squares Lines

Experimental data produce points (x1,y1),......(xn,yn) that when graphed,seem to lie close to a line.We want to determine the parameters B0 and B1 that make the line as"close'' to the points as possible.
Suppose B0 and B1 are fixed,and consider the line y=B0+B1x.Corresponding to each data point (xj,yj) there is a point(xj,B0+B1xj) on the line with the same x-coordinate.We call Yj the observed value of y and B0+B1xj the predicted y-value.The  difference between an observed y-value and a predicted value is called a residual .
There are several ways to measure how "close" the line is to the data.The ptimarily because the mathematical calculations are simple to add the squares of the residuals.The least squares line is the line y=B0+B1X that minimizes the sum of the squares of the residuals.This line is also called a line of regression of y on x,because any error in the data are assumed to beonly in the y-cordinates.The coefficients B0,B1 of the line are called regression coefficients.
If the data points were on the line,the parameters B0 and B1 would satisfy the equations
Predicted y-value          Observed y- value
B0+B1x1                    =          y1
B0+B1x2                    =          y2
.                                               .
.                                                .
B0+B1xn                    =          yn
We can write this system as
XB=y, where X=   [ 1      x1
1      x2
.        .
.        .
1        xn ],
B=[B0
B1],
y=[y1
y2
.
.
yn]
If the data points don't lie on a line, then there are no parameter B0,B1 for which the predicted y-values in XB equal the observed y-values in y, and XB=y hasno solution. This is a least squares problem,AX=b    , with different notation!
The square of the distance between the vectors XB and y is precisely the sum the squares of the residuals.The B that minimizes this sum also minimizes the distance between XB and y. Computing the least -squares solution of XB=y is equivalent to finding the B that determines the least-squares line.
Question
Find the equation y=B0+B1x of the least- squares line that best fits the data points(2,1),(5,2),(7,3),(8,3).
We assume that each observation,y can be described by the model
y=B0+B1x+e
Where e is random error with mean zero and variance.The {e}
are also assumed to be uncorrelated random variables.The regression
model of ginen equation involving only a single regressor variable
x is often called the simple linear regression model.
Analysis of residuals is frequently helpful in checking the assumption that the errors are NID(0,var) and in determining wether additional terms in the model would be useful.The experimenter can construct a frequency histogram of the residuals or plot them on normal probability paper.If the error are NID(0,var),then approximately 95% of standardized residuals should fall in the interval(-2,2).Errors far outside this interval may indicate the presence of an outlier, that is an observationthat is a typical of the rest of the data.Various rules have been proposed for discarding outliers.

## Multiple Regression

Suppose an experiment involves two independent variables-say, x1 and x2 and one dependent varoable y. A simple equation to predict y from x1 and x2 has the form
y=B0+B1x1+B2x2..........(1)
Amore general prediction equation might have the form
y=B0+B1x1+B2X2+B3x1^2+B4 x1 x2+B5x2^2.........(2)
This equation is used in geology,for instance,to model erosion surface,glacial cirques,soil PH, and other quantities.In such cases,the least squares fit is called a trend surface.Both equation (1) and (2) lead a linear model because they are linear in the unknown parameters.In general a linear model will aries whenever y is to be predicted by an equation of the form
y=B0f0(x1x2)+B1f1(x1x2)+.............+Bkfk(x1x2)
with f0.....fk any sort of known functions and B0.....Bk unknown weight.
Multiple regression model invloves more than one regressor variable
is called a multiple regression model.As an example , suppose that
the effective life of a cutting tool depends on the cutting speed and
the tool angle.Amultiple regression model that might describe this
relationship is:
y=B0+B1x1+B2x2+e
Where y represents the tool life,x1 represents the cutting speed and x2
represents the tool angle.This is multiple linear regression model with
two regressors.Alinear function of the unknown parametersB0,B1,and
B2.The model describes plane in the two dimensional x1,x2 space.The
parameter B0 defines the intercept of the plane.We sometimes called
and B2 partial regression coefficients,because B1 measures the expected
change in y per unit change in x1 when x2 is held constant and B2
measures the expected change in y per unit change in x2 when x1 is
held constant.

### Multiple Regression Analysis

Multiple regression analysis is a logical extension of simple regression analysis.In this method,two or more independent variables are used to estimate the values of dependent variable.Here average relationship among three or more variables is computed and it is used to forecast the value of dependent variable on the basis of given values of independent variables.For example, we can estimate the value of X1 where the value of X2 and X3 are given.There are three main objective  of multiple correlation and regression analysis:
a) To drive an equation which provides estimates of the dependent variable from values of two or more independent variables.
b) To measure the error of estimate invilved in using this regression equation as basis of estimation.
c) To measure the proportion of variable in the dependent variable which is explained by dependent variables.

### Regression Analysis Python

import pandas as pd
iris = pd.DataFrame
(data.data, columns=data.feature_names)

sepal length (cm)

sepal width (cm)

petal length (cm)

petal width (cm)

0

5.1

3.5

1.4

0.2

1

4.9

3.0

1.4

0.2

2

4.7

3.2

1.3

0.2

3

4.6

3.1

1.5

0.2

4

5.0

3.6

1.4

0.2

iris['sepal length (cm)']
0 5.1 1 4.9 2 4.7 3 4.6 4 5.0 ... 145 6.7 146 6.3 147 6.5 148 6.2 149 5.9 Name: sepal length (cm), Length: 150, dtype: float64
iris['sepal width (cm)']>1
0 True 1 True 2 True 3 True 4 True ... 145 True 146 True 147 True 148 True 149 True Name: sepal width (cm), Length: 150,
dtype: bool
iris['petal width (cm)']>3
0 False 1 False 2 False 3 False 4 False ... 145 False 146 False 147 False 148 False 149 False Name: petal width (cm), Length: 150,
dtype: bool
from matplotlib import pyplot as plt
import seaborn as sns
sns.scatterplot(x='sepal length (cm)',
y='petal length (cm)',data=iris) y=iris[['sepal length (cm)']]
x=iris[['sepal width (cm)']]
from sklearn.model_selection
import train_test_split
x_train,x_test,y_train,y_test=
train_test_split
(x,y,test_size=0.3)
sepal width (cm)
173.5
1123.0
263.4
313.4
802.4
sepal width (cm)
1342.6
513.2
1203.2
193.8
703.2
from sklearn.linear_model import
LinearRegression
lr=LinearRegression()
lr.fit(x_train,y_train)
LinearRegression(copy_X=True,
fit_intercept=True, n_jobs=None,
normalize=False)
y_pred=lr.predict(x_test)
sepal length (cm)
175.1
1126.8
265.0
315.4
805.5
y_pred[0:5]
array([[5.68319154], [5.84032627], [5.71461848], [5.71461848], [6.02888794]])
from sklearn.metrics
import mean_squared_error
mean_squared_error(y_test,y_pred)
0.9060450498564653
y= iris[['sepal length (cm)']]
x=iris[['sepal width (cm)','petal length
(cm)','petal width (cm)']]
x_train,x_test,y_train,y_test=
train_test_split
(x,y,test_size=0.3)
sepal width (cm)petal length (cm)petal width (cm)
992.84.11.3
822.73.91.2
692.53.91.1
1483.45.42.3
1353.06.12.3
lr2=LinearRegression()
lr2.fit(x_train,y_train)
LinearRegression(copy_X=True,
fit_intercept=True, n_jobs=None,
normalize=False)
y_pred=lr2.predict(x_test)
mean_squared_error(y_test,y_pred)
0.0813243348427029
# modle 2 is better than model 1
because less mean square error is good