Difference between revisions of "Logistic regression"

Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Logistic regression''' is a type of curve fitting. It used for discrete outcome variables, e.g. pass or fail.
'''Logistic regression''', abbreviated '''LR''', is a type of curve fitting. It used for categorical dependent (outcome) variables. Examples of categorical dependent variables are: (1) pass/fail, (2) dead/alive.


==Special types of LR==
==Types of logistic regression==
===Multinomial LR===
===Multinomial LR===
If the dependent variable is categorical and the categories are mutually exclusive (e.g. ''benign'', ''suspicious'', ''malignant'', ''insufficient''),  ''multinomial logistic regression'' is used.
If the dependent variable is categorical and the categories are mutually exclusive (e.g. ''benign'', ''suspicious'', ''malignant'', ''insufficient''),  ''multinomial logistic regression'' is used.


===Ordered LR===
===Ordered LR===
If the dependent variable is categorical and ordered (e.g. ''Grade 1'', ''Grade 2'', ''Grade 3''), ''ordered logistic regrssion'' is used.<ref>Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: [https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/ https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/]. Accessed on: March 24, 2018</ref>
If the dependent variable is categorical and can be ordered in a meaningful way (e.g. ''Grade 1'', ''Grade 2'', ''Grade 3''), ''ordered logistic regression'' is used.<ref>Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: [https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/ https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/]. Accessed on: March 24, 2018</ref>
 
==Convergence==
If an individual predictor variable is ''not'' predictive (e.g. one pathologist never calls a diagnosis) the data has ''complete separation'' and will not converge.


==GNU/Octave example==
==GNU/Octave example==
Line 74: Line 77:
xlabel('Study time (hours)');
xlabel('Study time (hours)');
ylabel('Probability of passing the exam (-)');
ylabel('Probability of passing the exam (-)');
</pre>
==R example==
<pre>
noquote ("---------------------------------------------------------------------------------------------------")
noquote ("A script in R to learn about logistic regression")
noquote ("---------------------------------------------------------------------------------------------------")
# Load library
library(rms)
# Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y=c( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 )
x=c( 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 )
# Make dataframe
mydata=data.frame(x,y)
# Create logistic regression model
mylrm=lrm(y~x, mydata)
z=predict(mylrm)
# Definite the logistic function
logit <- function(t) {
        sigma=1/(1+exp(-t))
        return(sigma)
}
# Get p values from the z values
logit(z)
ypred=logit(z)
# Get the coefficients from the LR model
options(digits=20)
coef(mylrm)
# Calculating the p values using the equation
beta0=-4.0777133660750397581
beta1=1.5046454013690906404
ycalc=1/(1+exp(-(beta0+beta1*x)))
# Check the calculation
noquote("The check...")
ycalc-ypred
# Add ycalc to dataframe
mydata=data.frame(x,y,ypred)
# Plot result
plot(x,ypred, main="Probability of Passing versus Time Studied", ylab="Probability of Passing", xlab="Time Studied in Hours")
# Show results
options(digits=7)
noquote("The model...")
mylrm
noquote("The input values and predicted values...")
mydata
noquote("Done calculation!")
</pre>
===Output===
<pre>
[1] ---------------------------------------------------------------------------------------------------
[1] A script in R to learn about logistic regression
[1] ---------------------------------------------------------------------------------------------------
[1] The check...
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1] The model...
Logistic Regression Model
lrm(formula = y ~ x, data = mydata)
                      Model Likelihood    Discrimination    Rank Discrim.   
                            Ratio Test          Indexes          Indexes   
Obs            20    LR chi2    11.67    R2      0.589    C      0.895   
0            10    d.f.            1    R2(1,20) 0.413    Dxy    0.790   
1            10    Pr(> chi2) 0.0006    R2(1,15) 0.509    gamma  0.798   
max |deriv| 1e-07                        Brier    0.137    tau-a  0.416   
          Coef    S.E.  Wald Z Pr(>|Z|)
Intercept -4.0777 1.7610 -2.32  0.0206 
x          1.5046 0.6287  2.39  0.0167 
[1] The input values and predicted values...
      x y      ypred
1  0.50 0 0.03471034
2  0.75 0 0.04977295
3  1.00 0 0.07089196
4  1.25 0 0.10002862
5  1.50 0 0.13934447
6  1.75 0 0.19083651
7  1.75 1 0.19083651
8  2.00 0 0.25570318
9  2.25 1 0.33353024
10 2.50 0 0.42162653
11 2.75 1 0.51501086
12 3.00 0 0.60735864
13 3.25 1 0.69261733
14 3.50 0 0.76648083
15 4.00 1 0.87444750
16 4.25 1 0.91027764
17 4.50 1 0.93662366
18 4.75 1 0.95561071
19 5.00 1 0.96909707
20 5.50 1 0.98519444
[1] Done calculation!
</pre>
</pre>


48,466

edits

Navigation menu