Difference between revisions of "Logistic regression"

From Libre Pathology
Jump to navigation Jump to search
(create)
 
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Logistic regression''' is a type of curve fitting. It used for discrete outcome variables, e.g. dead or alive.
'''Logistic regression''', abbreviated '''LR''', is a type of curve fitting. It used for categorical dependent (outcome) variables. Examples of categorical dependent variables are: (1) pass/fail, (2) dead/alive.
 
==Types of logistic regression==
===Multinomial LR===
If the dependent variable is categorical and the categories are mutually exclusive (e.g. ''benign'', ''suspicious'', ''malignant'', ''insufficient''), ''multinomial logistic regression'' is used.
 
===Ordered LR===
If the dependent variable is categorical and can be ordered in a meaningful way (e.g. ''Grade 1'', ''Grade 2'', ''Grade 3''), ''ordered logistic regression'' is used.<ref>Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: [https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/ https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/]. Accessed on: March 24, 2018</ref>
 
==Convergence==
If an individual predictor variable is ''not'' predictive (e.g. one pathologist never calls a diagnosis) the data has ''complete separation'' and will not converge.


==GNU/Octave example==
==GNU/Octave example==
Line 28: Line 38:
%  [[ alpha se_alpha zvalue_alpha zvalue_alpha^2 P_Wald_alpha ]
%  [[ alpha se_alpha zvalue_alpha zvalue_alpha^2 P_Wald_alpha ]
%    [ beta_1 se_beta_1 zvalue_beta_1 zvalue_beta_1^2 P_Wald_beta_1 ]
%    [ beta_1 se_beta_1 zvalue_beta_1 zvalue_beta_1^2 P_Wald_beta_1 ]
%    [ beta_2 se_beta_2 zvalue_beta_2 zvalue_beta_1^2 P_Wald_beta_1 ]
%    [ beta_2 se_beta_2 zvalue_beta_2 zvalue_beta_2^2 P_Wald_beta_2 ]
%    [ ... ... ... ... ... ]
%    [ ... ... ... ... ... ]
%    [ beta_n se_beta_n zvalue_beta_n zvalue_beta_1^2 P_Wald_beta_n  ]]
%    [ beta_n se_beta_n zvalue_beta_n zvalue_beta_n^2 P_Wald_beta_n  ]]
logit_arr=zeros(size(beta,1)+1,5);
logit_arr=zeros(size(beta,1)+1,5);
logit_arr(1,1)=theta;
logit_arr(1,1)=theta;
Line 37: Line 47:
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2); % zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2); % zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3); % zvalue_i^2
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3); % zvalue_i^2
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1) % Wald statistic is calcated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1) % Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/


% calculate the probability for a range of values
% calculate the probability for a range of values
Line 67: Line 77:
xlabel('Study time (hours)');
xlabel('Study time (hours)');
ylabel('Probability of passing the exam (-)');
ylabel('Probability of passing the exam (-)');
</pre>
==R example==
<pre>
noquote ("---------------------------------------------------------------------------------------------------")
noquote ("A script in R to learn about logistic regression")
noquote ("---------------------------------------------------------------------------------------------------")
# Load library
library(rms)
# Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y=c( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 )
x=c( 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 )
# Make dataframe
mydata=data.frame(x,y)
# Create logistic regression model
mylrm=lrm(y~x, mydata)
z=predict(mylrm)
# Definite the logistic function
logit <- function(t) {
        sigma=1/(1+exp(-t))
        return(sigma)
}
# Get p values from the z values
logit(z)
ypred=logit(z)
# Get the coefficients from the LR model
options(digits=20)
coef(mylrm)
# Calculating the p values using the equation
beta0=-4.0777133660750397581
beta1=1.5046454013690906404
ycalc=1/(1+exp(-(beta0+beta1*x)))
# Check the calculation
noquote("The check...")
ycalc-ypred
# Add ycalc to dataframe
mydata=data.frame(x,y,ypred)
# Plot result
plot(x,ypred, main="Probability of Passing versus Time Studied", ylab="Probability of Passing", xlab="Time Studied in Hours")
# Show results
options(digits=7)
noquote("The model...")
mylrm
noquote("The input values and predicted values...")
mydata
noquote("Done calculation!")
</pre>
===Output===
<pre>
[1] ---------------------------------------------------------------------------------------------------
[1] A script in R to learn about logistic regression
[1] ---------------------------------------------------------------------------------------------------
[1] The check...
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1] The model...
Logistic Regression Model
lrm(formula = y ~ x, data = mydata)
                      Model Likelihood    Discrimination    Rank Discrim.   
                            Ratio Test          Indexes          Indexes   
Obs            20    LR chi2    11.67    R2      0.589    C      0.895   
0            10    d.f.            1    R2(1,20) 0.413    Dxy    0.790   
1            10    Pr(> chi2) 0.0006    R2(1,15) 0.509    gamma  0.798   
max |deriv| 1e-07                        Brier    0.137    tau-a  0.416   
          Coef    S.E.  Wald Z Pr(>|Z|)
Intercept -4.0777 1.7610 -2.32  0.0206 
x          1.5046 0.6287  2.39  0.0167 
[1] The input values and predicted values...
      x y      ypred
1  0.50 0 0.03471034
2  0.75 0 0.04977295
3  1.00 0 0.07089196
4  1.25 0 0.10002862
5  1.50 0 0.13934447
6  1.75 0 0.19083651
7  1.75 1 0.19083651
8  2.00 0 0.25570318
9  2.25 1 0.33353024
10 2.50 0 0.42162653
11 2.75 1 0.51501086
12 3.00 0 0.60735864
13 3.25 1 0.69261733
14 3.50 0 0.76648083
15 4.00 1 0.87444750
16 4.25 1 0.91027764
17 4.50 1 0.93662366
18 4.75 1 0.95561071
19 5.00 1 0.96909707
20 5.50 1 0.98519444
[1] Done calculation!
</pre>
</pre>


Line 72: Line 192:
*[[Mathematics]].
*[[Mathematics]].
*[[Statistics]].
*[[Statistics]].
==References==
{{Reflist|1}}


==External links==
==External links==
*[https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html Correlation and Regression (gnu.org)].
*[https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html Correlation and Regression (gnu.org)].
*[https://en.wikipedia.org/wiki/Multinomial_logistic_regression Multinomial logistic regression (Wikipedia)].


[[Category:Stuff]]
[[Category:Stuff]]

Latest revision as of 05:59, 3 March 2024

Logistic regression, abbreviated LR, is a type of curve fitting. It used for categorical dependent (outcome) variables. Examples of categorical dependent variables are: (1) pass/fail, (2) dead/alive.

Types of logistic regression

Multinomial LR

If the dependent variable is categorical and the categories are mutually exclusive (e.g. benign, suspicious, malignant, insufficient), multinomial logistic regression is used.

Ordered LR

If the dependent variable is categorical and can be ordered in a meaningful way (e.g. Grade 1, Grade 2, Grade 3), ordered logistic regression is used.[1]

Convergence

If an individual predictor variable is not predictive (e.g. one pathologist never calls a diagnosis) the data has complete separation and will not converge.

GNU/Octave example

% -------------------------------------------------------------------------------------------------
% TO RUN THIS FROM WITHIN OCTAVE
% 	run logistic_regression_example.m
%
% TO RUN FROM THE COMMAND LINE
% 	octave logistic_regression_example.m > tmp.txt
% -------------------------------------------------------------------------------------------------
clear all;
close all;

% Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y = [ 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ];
x = [ 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ];

y=transpose(y);
x=transpose(x);

[theta, beta, dev, dl, d2l, gamma] = logistic_regression(y,x,1)

% calculate standard error
se = sqrt (diag (inv (-d2l)))

% create array  
%   [[ alpha 	se_alpha	zvalue_alpha 	zvalue_alpha^2		P_Wald_alpha	]
%    [ beta_1	se_beta_1	zvalue_beta_1	zvalue_beta_1^2		P_Wald_beta_1	]
%    [ beta_2	se_beta_2	zvalue_beta_2	zvalue_beta_2^2		P_Wald_beta_2	]
%    [ ...	...		...		...			...		]
%    [ beta_n	se_beta_n	zvalue_beta_n	zvalue_beta_n^2		P_Wald_beta_n  	]]
logit_arr=zeros(size(beta,1)+1,5);
logit_arr(1,1)=theta;
logit_arr(2:end,1)=beta;
logit_arr(1:end,2)=se;
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2);	% zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3);	% zvalue_i^2
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1)		% Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/

% calculate the probability for a range of values
studytime_arr= [ 1, 2, 3, 4, 5]
len_studytime_arr=size(studytime_arr,2);

% As per the manual ( https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html ) GNU/Octave function fits:
%	logit (gamma_i (x)) = theta_i - beta' * x,   i = 1 ... k-1

for ctr=1:len_studytime_arr
  P_logit(ctr)=1/(1+exp(theta-studytime_arr(ctr)*beta));
end

% print to screen output
P_logit
logit_arr

% create plot
len_x = size(x,1)

for ctr=1:len_x
  P_fitted(ctr)=1/(1+exp(theta-x(ctr)*beta));
end

plot(x,y,'o')
hold on;
grid on;
plot(x,P_fitted,'-')
xlabel('Study time (hours)');
ylabel('Probability of passing the exam (-)');

R example

noquote ("---------------------------------------------------------------------------------------------------")
noquote ("A script in R to learn about logistic regression")
noquote ("---------------------------------------------------------------------------------------------------")

# Load library
library(rms)

# Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y=c( 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 )
x=c( 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 )

# Make dataframe
mydata=data.frame(x,y)

# Create logistic regression model
mylrm=lrm(y~x, mydata)

z=predict(mylrm)

# Definite the logistic function
logit <- function(t) {
        sigma=1/(1+exp(-t))
        return(sigma)
}

# Get p values from the z values
logit(z)

ypred=logit(z)

# Get the coefficients from the LR model
options(digits=20)
coef(mylrm)

# Calculating the p values using the equation
beta0=-4.0777133660750397581
beta1=1.5046454013690906404
ycalc=1/(1+exp(-(beta0+beta1*x)))

# Check the calculation
noquote("The check...")
ycalc-ypred

# Add ycalc to dataframe
mydata=data.frame(x,y,ypred)

# Plot result
plot(x,ypred, main="Probability of Passing versus Time Studied", ylab="Probability of Passing", xlab="Time Studied in Hours")

# Show results
options(digits=7)
noquote("The model...")
mylrm

noquote("The input values and predicted values...")
mydata

noquote("Done calculation!")

Output

[1] ---------------------------------------------------------------------------------------------------
[1] A script in R to learn about logistic regression
[1] ---------------------------------------------------------------------------------------------------

[1] The check...
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1] The model...
Logistic Regression Model

lrm(formula = y ~ x, data = mydata)

                      Model Likelihood    Discrimination    Rank Discrim.    
                            Ratio Test           Indexes          Indexes    
Obs            20    LR chi2     11.67    R2       0.589    C       0.895    
 0             10    d.f.            1    R2(1,20) 0.413    Dxy     0.790    
 1             10    Pr(> chi2) 0.0006    R2(1,15) 0.509    gamma   0.798    
max |deriv| 1e-07                         Brier    0.137    tau-a   0.416    

          Coef    S.E.   Wald Z Pr(>|Z|)
Intercept -4.0777 1.7610 -2.32  0.0206  
x          1.5046 0.6287  2.39  0.0167  

[1] The input values and predicted values...
      x y      ypred
1  0.50 0 0.03471034
2  0.75 0 0.04977295
3  1.00 0 0.07089196
4  1.25 0 0.10002862
5  1.50 0 0.13934447
6  1.75 0 0.19083651
7  1.75 1 0.19083651
8  2.00 0 0.25570318
9  2.25 1 0.33353024
10 2.50 0 0.42162653
11 2.75 1 0.51501086
12 3.00 0 0.60735864
13 3.25 1 0.69261733
14 3.50 0 0.76648083
15 4.00 1 0.87444750
16 4.25 1 0.91027764
17 4.50 1 0.93662366
18 4.75 1 0.95561071
19 5.00 1 0.96909707
20 5.50 1 0.98519444
[1] Done calculation!

See also

References

  1. Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/. Accessed on: March 24, 2018

External links