Logistic Regression Example

Logistic Regression Python  

Logistic Regression
                                                                                                     
Logistic regression is the preferred method for examining this type of data. These methods are different from what we have seen so far. At the same time, you will recognize a lot of similar features.
Let's being with an example of the type of data that lends itself to this analysis. Consider the experimental data summarized in Table. There were six large jars, each containing a number of beetles and a carefully measured small amount of insecticide. After a specified amount of time, the experimenters examined the number of beetles that were still alive. We can calculate the empirical death rate for each jar's level of exposure to the insecticide. These are given in the last row of the Table  using                                                                                                                                                                                                            Mortality rate =    Number died / Number exposed                                       

How can we develop statistical models to describe this data? 

We should immediately recognize that within each jar the outcomes are binary-valued: alive or dead. We can probably assume that these events are independent of each other. The counts of alive or dead should then follow the binomial distribution. (See for a quick review of this important statistical model.) There are six separate and independent binomial experiments in this example. The n parameters for the binomial models are the numbers of insects in each jar. Similarly, the p parameters represent the mortality probabilities in each jar. The aim is to model the p parameters of these six binomial experiments. Any models we develop for this data will need to incorporate the various insecticide dose levels to describe the mortality probability in each jar. The alert reader will notice that the empirical mortality rates given in the last row of the Table are not monotonically increasing with increasing exposure levels of the insecticide. Despite this remark, there is no reason for us to fit a nonmonotonic model to these data.   
The aim of this chapter is to explain models for the different values of the parameters using the values of the doses of the insecticide in each jar. The first idea that comes to mind is to treat the outcomes as normally distributed. All of the ns are large, so a normal approximation to the binomial distributions should work. We also already know a lot about abort linear regression, so we are tempted to fit the linear model
regression

What is wrong with this approach? 

It seems simple enough, but remember that p must always be between 0 and 1. There is no guarantee that at extreme values of the Does, this straight-line model would result in estimates of p that are less than 0 or greater than 1. We would expect a good model for these data to always give an estimate of p between 0 and 1. A second but less striking problem is that the variance of the binomial distribution is not the same for different values of the p parameter so the assumption of constant variance is not valid for the usual linear regression. 

Logistic Regression with python analysis

data=pd.read_csv('Churn_Modelling.csv')
data

Logistic regression step-by-step python code


from sklearn.model_selection import train_test_split
train_test_split(X,Y,test_size=0.20,random_state=42,stratify=Y)
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.20,random_state=42,stratify=Y)
X_train,X_test,Y_train,Y_test
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)
X_train     # This is feature scaling
array([[ 1.058568 , 1.71508648, 0.68472287, ..., -0.57831252, -0.57773517, 0.90750738], [ 0.91362605, -0.65993547, -0.6962018 , ..., 1.72916886, -0.57773517, 0.90750738], [ 1.07927399, -0.18493108, -1.73189531, ..., 1.72916886, -0.57773517, -1.10191942], ..., [ 0.16821031, -0.18493108, 1.3751852 , ..., -0.57831252, -0.57773517, -1.10191942], [ 0.37527024, -0.37493284, 1.02995403, ..., -0.57831252, 1.73089688, 0.90750738], [ 1.56586482, 1.14508121, 0.68472287, ..., -0.57831252, 1.73089688, 0.90750738]])
X_test
array([[-0.66718803, -0.27324727, 0.69686459, ..., -0.58042949, -0.55809982, 0.93228691], [-1.28654133, -0.56367837, -0.34712731, ..., 1.72286214, -0.55809982, 0.93228691], [-0.95621957, 0.11399421, -0.34712731, ..., -0.58042949, 1.79179416, -1.07263117], ..., [-1.37944433, 0.79166679, 1.39285919, ..., 1.72286214, -0.55809982, -1.07263117], [ 0.4063577 , 0.01718384, 0.69686459, ..., -0.58042949, -0.55809982, 0.93228691], [ 1.03603357, -0.56367837, -1.0431219 , ..., -0.58042949, -0.55809982, 0.93228691]])
from sklearn.linear_model import LogisticRegression
log=LogisticRegression()
log.fit(X_train,Y_train)
LogisticRegression()
Y_pred1=log.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(Y_test, Y_pred1)
0.809
from sklearn.metrics import recall_score, precision_score, f1_score
precision_score(Y_test, Y_pred1)
0.5939849624060151
recall_score(Y_test, Y_pred1)
0.1941031941031941
f1_score(Y_test, Y_pred1)
0.29259259259259257

More
                                                           
                                                    

Post a Comment

3 Comments

  1. Feel as though you are proprietor: Although you are welcomed as the information extractor, you ought to build up the feeling of possession. machine learning course

    ReplyDelete
  2. Always so interesting to visit your site.What a great info, thank you for sharing. this will help me so much in my learning https://europa-road.eu/

    ReplyDelete
  3. 360DigiTMG offers the best Data Analytics courses in the market with placement assistance and live projects. Enroll today and become a Data Science professional in the next 6 months.
    data analytics course in hyderabad

    ReplyDelete